Why do conventional unit tests miss runtime regressions?

Conventional unit tests often verify assumptions through hand-written mocks. They miss regressions caused by DTO serialization drift, query count changes, changed error response shapes, or transaction proxy issues that only appear during real runtime execution.

How does trace-based testing work in practice?

It works in five steps: 1. Capture a real execution in the JVM. 2. Preserve runtime facts like payloads and SQL. 3. Virtualize service boundaries. 4. Replay the scenario against new code. 5. Compare before and after behavior to detect drifted behavior.

What Is Trace-Based Testing? A Practical Guide for Java and Spring Boot Teams

April 17, 2026 · 9 min read

Dmitry Turmyshev

Product Manager | Developer Experience and Software Quality

TL;DR: Trace-based testing is a software testing approach where tests are built from real execution traces captured from a running application. Instead of writing mock data manually or using AI to guess test cases from source code, trace-based testing records actual method calls, SQL queries, and API responses, then replays them as standard JUnit tests.

Teams usually discover trace-based testing when they hit the same wall: green unit tests, green code review, and still a production regression after a harmless-looking change.

In Java and Spring Boot systems, the problem is rarely "we had zero tests." The problem is that traditional tests often verify a simplified version of reality.

If your Mockito setup says the repository returns one DTO, the Feign client returns a perfect payload, and the clock always behaves, you are mostly testing your assumptions. Production, unfortunately, does not run inside your assumptions.

What Trace-Based Testing Actually Means

Trace-based testing starts from a real runtime execution.

A system like BitDive captures the scenario as it actually happened in development, staging, or production. That recorded scenario can include:

HTTP request payloads and headers
internal method call chains
SQL queries with parameters and returned rows
downstream REST and Feign requests and responses
Kafka publishes and consumed messages
exceptions, error paths, and return values
volatile values like time, UUIDs, and randomness

That trace becomes the source of truth for verification.

Instead of writing a synthetic test from memory, you replay the real scenario against the changed code and verify that the runtime behavior stayed correct.

Why Conventional Tests Miss Runtime Regressions

Unit tests are useful. Integration tests are useful. End-to-end tests are useful. But they each leave gaps.

Mock-heavy unit tests verify assumptions

This is where many false greens come from. A mock-based test rarely catches:

DTO or serialization drift
query count regressions
changed error response shape
missing headers in downstream calls
transaction proxy issues
filters, interceptors, or validation changing runtime behavior

The test stays green because the mocked world never behaved like production in the first place.

E2E coverage is too expensive to be broad enough

Full environments are realistic, but slow and costly. Most teams keep only a thin E2E layer, which leaves a large middle zone unprotected.

AI-assisted coding amplifies the problem

AI can build a patch that compiles, passes static review, and still changes:

the shape of a JSON contract
the number of SQL queries
the order of downstream calls
the structure of an error response
the internal path a request takes through the service

These are runtime regressions. You need runtime evidence to catch them reliably.

How Trace-Based Testing Works in Practice

At a high level, the loop is straightforward.

1. Capture a Real Execution

BitDive records one real request as it moves through the running JVM.

2. Preserve the Runtime Facts That Matter

The trace keeps the real behavior that defines the scenario: payloads, SQL, downstream responses, timings, errors, and method flow.

3. Separate the Service Boundary

BitDive keeps the internal chain real and virtualizes only what sits outside the service boundary.

That means your real code still runs, while recorded boundary interactions are replayed instead of calling live infrastructure.

4. Replay the Scenario Against the New Code

When the test runs, BitDive re-executes the scenario against the changed code in a deterministic environment.

For a Spring Boot replay integration test, that usually means:

HTTP request -> filters -> validation -> service logic -> transaction boundaries -> repositories -> response serialization

5. Compare Before and After Behavior

This is the proof step. BitDive can compare:

response payloads
SQL count and shape
downstream calls and headers
error responses
call path changes
timings and side effects

If behavior drifted in a meaningful way, the verification fails.

Why This Model Is Stronger Than Hand-Written Mocks

The value is not that the test looks shorter. The value is that the source of truth is better.

With a manual test, you often write:

mock beans
WireMock stubs
fixture builders
setup code for infrastructure dependencies
assertions that overfit to technical noise

With trace-based testing, the behavior already exists in recorded form.

A replay test can stay very small:

class PolicyControllerReplayTest extends ReplayTestBase {

    @Override
    protected List<ReplayTestConfiguration> getTestConfigurations() {
        return ReplayTestUtils.fromRestApiWithJsonContentConfigFile(
                Arrays.asList("0d46c175-4926-4fb6-ad2f-866acdc72996")
        );
    }
}

The important part is not tiny code. The important part is that the scenario ID points to a real execution rather than an invented test world.

What Trace-Based Testing Catches Well

Trace-based testing is strongest when the bug is subtle in code review but obvious at runtime.

API and Serialization Drift

A DTO refactor, date format change, or enum serialization shift can compile cleanly and still break consumers.

Error Contract Regressions

The service may still return 404, but with a different body shape or headers than upstream systems expect.

N+1 Queries and SQL Regressions

The endpoint still returns the same JSON, but now runs 40 queries instead of 2.

Unexpected External Calls

An extra Feign request, Kafka publish, or downstream call may introduce cost or break integration guarantees.

Full Spring Context Issues

Some bugs appear only when real wiring is involved:

transaction proxies
validation behavior
repository and schema mismatches
request filters
mapper behavior under real request flow

This is why trace-based testing fits especially well with Spring Boot integration testing.

Trace-Based Testing vs Unit Tests vs E2E

Approach	Source of truth	Main strength	Main weakness
Unit tests	hand-written expectations and mocks	fast for local logic	weak against runtime seams
E2E tests	live full-system execution	realistic	slow and expensive to scale
Trace-based testing	real captured runtime behavior	realistic and repeatable	requires capture and replay discipline

This is not a reason to delete all your unit tests. Pure logic still benefits from ordinary unit coverage.

The point is different: trace-based testing covers the runtime surface that mock-heavy suites usually miss.

Why It Matters for AI-Native Development

BitDive uses trace-based testing as part of a wider deterministic verification workflow.

The first pillar is runtime context for AI agents. Through MCP, an agent can inspect real payloads, SQL queries, call chains, and downstream operations before changing code.

The second pillar is replay-based regression memory. Once behavior is verified, the same scenario becomes deterministic protection in local development and CI.

That changes the standard from:

"The patch looks plausible."

to:

"The patch was verified against the same runtime scenario and preserved the intended behavior."

That is the practical difference between AI-assisted coding and AI-assisted engineering.

When to Use Trace-Based Testing First

This approach is especially valuable when:

you are upgrading Spring Boot and need runtime proof that APIs did not drift
you are refactoring legacy code with poor existing tests
you need to turn a production bug into permanent regression protection
mocking the dependency graph is too expensive
AI is contributing code and you need stronger proof than lint plus unit tests
you care about SQL behavior, contract safety, and call flow, not just final status codes

Where to Go Next in BitDive

If you want the product view, start here:

What Is Trace-Based Testing? for the category overview
Testing Overview
Automated JUnit Tests from Real Traces
Integration Testing with Deterministic Replay
Inter-Service API Verification
Runtime Verification for AI Agents

Turn Real Runtime Behavior into Regression Proof

BitDive captures real executions, compares before and after traces, and turns verified scenarios into deterministic JUnit replay protection.

Try BitDive Free

Spring Boot Integration Testing: Full Context, Stubbed Boundaries -- when replay-based integration tests beat mock-heavy suites
Why AI Coding Agents Break APIs -- why source-only reasoning fails at runtime
Stop Cluttering Your Codebase with Brittle Generated Tests -- why replay data scales better than generated test code

Book a Demo or Try BitDive Free if you want trace-based testing without building a custom replay platform yourself.

What Trace-Based Testing Actually Means​

Why Conventional Tests Miss Runtime Regressions​

Mock-heavy unit tests verify assumptions​

E2E coverage is too expensive to be broad enough​

AI-assisted coding amplifies the problem​

How Trace-Based Testing Works in Practice​

1. Capture a Real Execution​

2. Preserve the Runtime Facts That Matter​

3. Separate the Service Boundary​

4. Replay the Scenario Against the New Code​

5. Compare Before and After Behavior​

Why This Model Is Stronger Than Hand-Written Mocks​

What Trace-Based Testing Catches Well​

API and Serialization Drift​

Error Contract Regressions​

N+1 Queries and SQL Regressions​

Unexpected External Calls​

Full Spring Context Issues​

Trace-Based Testing vs Unit Tests vs E2E​

Why It Matters for AI-Native Development​

When to Use Trace-Based Testing First​

Where to Go Next in BitDive​