How to Detect and Fix N+1 Queries in Spring Boot Before Production

TL;DR: The hardest part of an N+1 bug is not fixing it. The hardest part is noticing it before production. A Spring Boot endpoint can return the correct JSON, pass unit tests, and still execute 243 SQL queries instead of 1. This post shows how to detect N+1 patterns from real traces, choose the right fix in Spring Data JPA, and verify that the optimization did not change the API output.
Why N+1 Queries Keep Reaching Production
N+1 problems survive longer than they should because they rarely break correctness first. They break cost, latency, and scale.
A typical flow looks safe in code review:
- load a list of parent entities
- access a lazy collection or related entity inside a loop
- serialize the result into a response
The endpoint still returns 200 OK. The data looks right. The unit tests stay green because repositories and external dependencies are mocked. But in production, the request does this:
- 1 query to load the parent list
- N more queries to load each child or nested relation
At small volume, this looks like a non-event. Under real load, it becomes:
- slow endpoints
- high database CPU
- connection pool pressure
- noisy latency spikes
- unexpected cloud cost
That is why N+1 is a runtime problem first. You need visibility into the actual SQL sequence, not just the Java code.
What an N+1 Query Looks Like in Spring Boot
The classic pattern appears in JPA/Hibernate-backed Spring Boot services.
Imagine a Student entity with a lazy association to courses:
@Entity
public class Student {
@Id
private Long id;
private String firstName;
private String lastName;
@ManyToMany(fetch = FetchType.LAZY)
@JoinTable(
name = "enrollment",
joinColumns = @JoinColumn(name = "student_id"),
inverseJoinColumns = @JoinColumn(name = "course_id")
)
private List<Course> courses;
}
And a service like this:
public List<StudentDto> getStudents() {
return studentRepository.findAll().stream()
.map(studentMapper::toDto)
.toList();
}
If studentMapper::toDto touches student.getCourses() and each course touches course.getTeacher(), Hibernate may execute:
- 1 query for all students
- 1 query per student for courses
- 1 query per course set for teacher data
The code looks harmless. The database says otherwise.
A Real Runtime Pattern
This is the kind of shape BitDive sees in traces:
SELECT * FROM student;
SELECT c.*, t.*
FROM enrollment e
JOIN course c ON c.id = e.course_id
LEFT JOIN teacher t ON t.id = c.teacher_id
WHERE e.student_id = ?;
SELECT c.*, t.*
FROM enrollment e
JOIN course c ON c.id = e.course_id
LEFT JOIN teacher t ON t.id = c.teacher_id
WHERE e.student_id = ?;
SELECT c.*, t.*
FROM enrollment e
JOIN course c ON c.id = e.course_id
LEFT JOIN teacher t ON t.id = c.teacher_id
WHERE e.student_id = ?;
Same query shape, different bind parameter. Repeated dozens or hundreds of times inside one request. That is the signature.
How to Detect N+1 Queries Before Production
1. Count SQL Queries Per Request, Not Per Test Suite
The right unit of analysis is one business request.
If GET /students returns 50 students and runs:
- 1 query for the root list
- 50 queries for child collections
you already have an N+1 pattern, even if the response time still "looks fine" on a laptop.
2. Look for Repeated Query Shapes with Different IDs
The giveaway is repetition:
- same
SELECT - same joins
- same projection
- only the
WHERE ... = ?bind value changes
This is why raw logs are not enough. Grepping logs is slow and noisy. You want one request trace with the SQL sequence grouped in order.
3. Compare Query Count to the Result Cardinality
If one endpoint returns:
- 1 order and runs 2 queries, that may be normal
- 100 students and runs 243 queries, that is not normal
The important ratio is not only "slow or fast." It is "how many queries did this request need to produce one response?"
4. Compare Before and After, Not Just Before
Most teams detect N+1 only after the regression has already shipped. A safer pattern is:
- capture the current request trace
- apply the optimization
- capture the same request again
- compare SQL count, SQL shape, timings, and response payload
That way you prove both sides:
- performance improved
- business behavior stayed the same
Why Unit Tests Usually Miss N+1
A mock-heavy unit test cannot see:
- real SQL count
- lazy loading behavior
- serialization side effects
- transaction boundaries
- mapper access to nested associations
It only sees the objects you fed into the mock.
This is why N+1 bugs often appear after an innocent request like:
- "add courses to the response"
- "include teacher name in the DTO"
- "return full details instead of IDs"
The Java change is small. The SQL explosion is invisible unless you capture the runtime path.
The 4 Most Useful Fixes in Spring Boot
There is no single universal fix. The right choice depends on the endpoint shape.
1. @EntityGraph for Repository-Level Fetch Plans
If one repository method needs a richer graph than the default mapping, @EntityGraph is often the cleanest fix.
public interface StudentRepository extends JpaRepository<Student, Long> {
@EntityGraph(attributePaths = {"courses", "courses.teacher"})
@Query("SELECT DISTINCT s FROM Student s ORDER BY s.lastName ASC, s.firstName ASC")
List<Student> findAllWithCoursesAndTeachersEntityGraph();
}
Then the service becomes:
public List<Student> findAll() {
return studentRepository.findAllWithCoursesAndTeachersEntityGraph();
}
Use this when:
- one endpoint needs the richer graph
- the fetch plan is request-specific
- you want the intention close to the repository method
2. JOIN FETCH for Explicit Query Control
If the query is highly specific and you want exact control in JPQL, a fetch join is straightforward.
@Query("""
select distinct s
from Student s
left join fetch s.courses c
left join fetch c.teacher
order by s.lastName, s.firstName
""")
List<Student> findAllWithCoursesAndTeachers();
Use this when:
- the endpoint is read-heavy
- the fetch shape is fixed
- you want the join strategy visible in the query itself
Be careful with very wide graphs. One big join can trade N+1 for row explosion.
3. DTO Projections for Read Models
If the endpoint is purely read-oriented and does not need managed entities, a projection is often better than loading a rich entity graph and serializing it.
Use a DTO projection when:
- you control the exact response shape
- you only need a subset of columns
- the endpoint is query-driven, not domain-behavior-driven
This can be the cleanest fix for list endpoints and reporting APIs.
4. Batch Fetching for Some Association Patterns
In some cases, batch fetching can reduce the number of secondary selects without forcing one huge join. This is useful when one giant fetch join would create too many duplicate rows.
Use it carefully. Batch fetching can improve the pattern, but it does not make the query plan obvious to reviewers. You still need trace-level evidence that the request improved.
What Not to Do
Do Not Default Everything to FetchType.EAGER
EAGER is not a real N+1 strategy. It is a blunt mapping decision. It can:
- hide performance problems in one path
- create oversized object graphs in another
- trigger unnecessary joins everywhere
Fix the request shape, not the whole domain model by default.
Do Not Trust a Faster Response Alone
If one request went from 240 queries to 4, that is good. But you still need to verify:
- did the response body change?
- did nested fields disappear?
- did ordering change?
- did headers or error handling drift?
Performance optimization that silently changes the API contract is still a regression.
A Safer N+1 Workflow with BitDive
The most reliable workflow is:
Step 1: Capture the Baseline Request
Trigger the real endpoint with BitDive attached and inspect:
- SQL sequence
- query count
- timings
- response body
Step 2: Identify the Repeated Query Pattern
BitDive shows the full execution tree and SQL sequence for one request, so you can see exactly where the repeated selects begin.
For a concrete walkthrough, see Cursor AI & BitDive: Diagnosing N+1 Queries with Runtime Context.
Step 3: Apply the Smallest Reasonable Fix
Typical fixes:
- add
@EntityGraph - replace
findAll()with a fetch-specific repository method - switch the endpoint to a DTO projection
- adjust the association access path
Step 4: Capture the Same Request Again
Run the same endpoint after the change and capture a second trace.
Step 5: Compare Before and After
This is the proof step. Compare:
- SQL count
- SQL shape
- response payload
- method return values
- downstream behavior
For example:
| Metric | Before | After |
|---|---|---|
| Response time | 94ms | 13ms |
| SQL queries | 243 | 1 |
| Response body | identical | identical |
That is a safe optimization. Faster and behaviorally stable.
Step 6: Preserve It as Regression Memory
Once the optimized request is verified, turn that execution into replay-based protection so the N+1 pattern does not come back six weeks later.
That is where trace-based testing becomes part of performance engineering, not just functional testing.
Why This Matters for AI-Assisted Development
N+1 is a perfect example of where AI needs runtime context.
An agent reading source code can see:
studentRepository.findAll()student.getCourses()- a mapper calling nested getters
What it cannot see from static code alone is:
- the actual SQL sequence
- how many times the query repeats
- whether the "fix" changed the API output
That is why BitDive's runtime context for AI agents matters. The agent can inspect the trace, apply a targeted fix, then compare before and after traces to verify it improved the query plan without breaking the response.
Find N+1 Queries from Real Runtime Traces
BitDive captures SQL activity, method calls, and response payloads for one real request. Detect repeated query patterns, apply a focused fix, and verify that performance improved without changing the API contract.
Explore code-level observability