Skip to main content

How to Detect and Fix N+1 Queries in Spring Boot Before Production

· 10 min read
Evgenii Frolikov
Senior Java Architect | Expert in High-Load Systems & JVM Internals

How to Detect and Fix N+1 Queries in Spring Boot Before Production

TL;DR: The hardest part of an N+1 bug is not fixing it. The hardest part is noticing it before production. A Spring Boot endpoint can return the correct JSON, pass unit tests, and still execute 243 SQL queries instead of 1. This post shows how to detect N+1 patterns from real traces, choose the right fix in Spring Data JPA, and verify that the optimization did not change the API output.

Why N+1 Queries Keep Reaching Production

N+1 problems survive longer than they should because they rarely break correctness first. They break cost, latency, and scale.

A typical flow looks safe in code review:

  1. load a list of parent entities
  2. access a lazy collection or related entity inside a loop
  3. serialize the result into a response

The endpoint still returns 200 OK. The data looks right. The unit tests stay green because repositories and external dependencies are mocked. But in production, the request does this:

  • 1 query to load the parent list
  • N more queries to load each child or nested relation

At small volume, this looks like a non-event. Under real load, it becomes:

  • slow endpoints
  • high database CPU
  • connection pool pressure
  • noisy latency spikes
  • unexpected cloud cost

That is why N+1 is a runtime problem first. You need visibility into the actual SQL sequence, not just the Java code.

What an N+1 Query Looks Like in Spring Boot

The classic pattern appears in JPA/Hibernate-backed Spring Boot services.

Imagine a Student entity with a lazy association to courses:

@Entity
public class Student {

@Id
private Long id;

private String firstName;
private String lastName;

@ManyToMany(fetch = FetchType.LAZY)
@JoinTable(
name = "enrollment",
joinColumns = @JoinColumn(name = "student_id"),
inverseJoinColumns = @JoinColumn(name = "course_id")
)
private List<Course> courses;
}

And a service like this:

public List<StudentDto> getStudents() {
return studentRepository.findAll().stream()
.map(studentMapper::toDto)
.toList();
}

If studentMapper::toDto touches student.getCourses() and each course touches course.getTeacher(), Hibernate may execute:

  • 1 query for all students
  • 1 query per student for courses
  • 1 query per course set for teacher data

The code looks harmless. The database says otherwise.

A Real Runtime Pattern

This is the kind of shape BitDive sees in traces:

SELECT * FROM student;

SELECT c.*, t.*
FROM enrollment e
JOIN course c ON c.id = e.course_id
LEFT JOIN teacher t ON t.id = c.teacher_id
WHERE e.student_id = ?;

SELECT c.*, t.*
FROM enrollment e
JOIN course c ON c.id = e.course_id
LEFT JOIN teacher t ON t.id = c.teacher_id
WHERE e.student_id = ?;

SELECT c.*, t.*
FROM enrollment e
JOIN course c ON c.id = e.course_id
LEFT JOIN teacher t ON t.id = c.teacher_id
WHERE e.student_id = ?;

Same query shape, different bind parameter. Repeated dozens or hundreds of times inside one request. That is the signature.

How to Detect N+1 Queries Before Production

1. Count SQL Queries Per Request, Not Per Test Suite

The right unit of analysis is one business request.

If GET /students returns 50 students and runs:

  • 1 query for the root list
  • 50 queries for child collections

you already have an N+1 pattern, even if the response time still "looks fine" on a laptop.

2. Look for Repeated Query Shapes with Different IDs

The giveaway is repetition:

  • same SELECT
  • same joins
  • same projection
  • only the WHERE ... = ? bind value changes

This is why raw logs are not enough. Grepping logs is slow and noisy. You want one request trace with the SQL sequence grouped in order.

3. Compare Query Count to the Result Cardinality

If one endpoint returns:

  • 1 order and runs 2 queries, that may be normal
  • 100 students and runs 243 queries, that is not normal

The important ratio is not only "slow or fast." It is "how many queries did this request need to produce one response?"

4. Compare Before and After, Not Just Before

Most teams detect N+1 only after the regression has already shipped. A safer pattern is:

  1. capture the current request trace
  2. apply the optimization
  3. capture the same request again
  4. compare SQL count, SQL shape, timings, and response payload

That way you prove both sides:

  • performance improved
  • business behavior stayed the same

Why Unit Tests Usually Miss N+1

A mock-heavy unit test cannot see:

  • real SQL count
  • lazy loading behavior
  • serialization side effects
  • transaction boundaries
  • mapper access to nested associations

It only sees the objects you fed into the mock.

This is why N+1 bugs often appear after an innocent request like:

  • "add courses to the response"
  • "include teacher name in the DTO"
  • "return full details instead of IDs"

The Java change is small. The SQL explosion is invisible unless you capture the runtime path.

The 4 Most Useful Fixes in Spring Boot

There is no single universal fix. The right choice depends on the endpoint shape.

1. @EntityGraph for Repository-Level Fetch Plans

If one repository method needs a richer graph than the default mapping, @EntityGraph is often the cleanest fix.

public interface StudentRepository extends JpaRepository<Student, Long> {

@EntityGraph(attributePaths = {"courses", "courses.teacher"})
@Query("SELECT DISTINCT s FROM Student s ORDER BY s.lastName ASC, s.firstName ASC")
List<Student> findAllWithCoursesAndTeachersEntityGraph();
}

Then the service becomes:

public List<Student> findAll() {
return studentRepository.findAllWithCoursesAndTeachersEntityGraph();
}

Use this when:

  • one endpoint needs the richer graph
  • the fetch plan is request-specific
  • you want the intention close to the repository method

2. JOIN FETCH for Explicit Query Control

If the query is highly specific and you want exact control in JPQL, a fetch join is straightforward.

@Query("""
select distinct s
from Student s
left join fetch s.courses c
left join fetch c.teacher
order by s.lastName, s.firstName
""")
List<Student> findAllWithCoursesAndTeachers();

Use this when:

  • the endpoint is read-heavy
  • the fetch shape is fixed
  • you want the join strategy visible in the query itself

Be careful with very wide graphs. One big join can trade N+1 for row explosion.

3. DTO Projections for Read Models

If the endpoint is purely read-oriented and does not need managed entities, a projection is often better than loading a rich entity graph and serializing it.

Use a DTO projection when:

  • you control the exact response shape
  • you only need a subset of columns
  • the endpoint is query-driven, not domain-behavior-driven

This can be the cleanest fix for list endpoints and reporting APIs.

4. Batch Fetching for Some Association Patterns

In some cases, batch fetching can reduce the number of secondary selects without forcing one huge join. This is useful when one giant fetch join would create too many duplicate rows.

Use it carefully. Batch fetching can improve the pattern, but it does not make the query plan obvious to reviewers. You still need trace-level evidence that the request improved.

What Not to Do

Do Not Default Everything to FetchType.EAGER

EAGER is not a real N+1 strategy. It is a blunt mapping decision. It can:

  • hide performance problems in one path
  • create oversized object graphs in another
  • trigger unnecessary joins everywhere

Fix the request shape, not the whole domain model by default.

Do Not Trust a Faster Response Alone

If one request went from 240 queries to 4, that is good. But you still need to verify:

  • did the response body change?
  • did nested fields disappear?
  • did ordering change?
  • did headers or error handling drift?

Performance optimization that silently changes the API contract is still a regression.

A Safer N+1 Workflow with BitDive

The most reliable workflow is:

Step 1: Capture the Baseline Request

Trigger the real endpoint with BitDive attached and inspect:

  • SQL sequence
  • query count
  • timings
  • response body

Step 2: Identify the Repeated Query Pattern

BitDive shows the full execution tree and SQL sequence for one request, so you can see exactly where the repeated selects begin.

For a concrete walkthrough, see Cursor AI & BitDive: Diagnosing N+1 Queries with Runtime Context.

Step 3: Apply the Smallest Reasonable Fix

Typical fixes:

  • add @EntityGraph
  • replace findAll() with a fetch-specific repository method
  • switch the endpoint to a DTO projection
  • adjust the association access path

Step 4: Capture the Same Request Again

Run the same endpoint after the change and capture a second trace.

Step 5: Compare Before and After

This is the proof step. Compare:

  • SQL count
  • SQL shape
  • response payload
  • method return values
  • downstream behavior

For example:

MetricBeforeAfter
Response time94ms13ms
SQL queries2431
Response bodyidenticalidentical

That is a safe optimization. Faster and behaviorally stable.

Step 6: Preserve It as Regression Memory

Once the optimized request is verified, turn that execution into replay-based protection so the N+1 pattern does not come back six weeks later.

That is where trace-based testing becomes part of performance engineering, not just functional testing.

Why This Matters for AI-Assisted Development

N+1 is a perfect example of where AI needs runtime context.

An agent reading source code can see:

  • studentRepository.findAll()
  • student.getCourses()
  • a mapper calling nested getters

What it cannot see from static code alone is:

  • the actual SQL sequence
  • how many times the query repeats
  • whether the "fix" changed the API output

That is why BitDive's runtime context for AI agents matters. The agent can inspect the trace, apply a targeted fix, then compare before and after traces to verify it improved the query plan without breaking the response.

Find N+1 Queries from Real Runtime Traces

BitDive captures SQL activity, method calls, and response payloads for one real request. Detect repeated query patterns, apply a focused fix, and verify that performance improved without changing the API contract.

Explore code-level observability