Regression Management in BitDive

Creating tests from traces is only half of the job. The other half is keeping those regression suites aligned with real, intended system behavior over time.

This page explains how to operate BitDive regression suites safely:

when to create a new suite
when to refresh an existing one
when to update only failed entries
when to replace one specific method surgically
when not to update anything because the failure is a real regression

The Core Rule

Verify first. Update second.

Never refresh a BitDive baseline just because a test turned red.

Before updating any regression suite, confirm:

the new runtime behavior is intended
the before and after trace difference is explainable
the wider test suite still makes sense in context

If you cannot explain the diff, do not bless it.

What You Are Managing

BitDive regression management revolves around a few objects:

trace / call_id: one recorded real execution
test group / test script: a saved suite built from traces
test entry / method entry: one method or scenario inside that group

Operationally, BitDive lets you manage regression memory at three levels:

create a new suite
refresh a whole existing suite
replace only the failing or changed part

The safest choice is usually the smallest one that matches reality.

Expected vs Unexpected Changes

This is the most important classification in daily work.

Expected Changes

These are changes you intended and verified:

bug fix changed the error path
refactor preserved output but reduced SQL calls
DTO change intentionally modified the response contract
infrastructure change legitimately changed serialization format

For expected changes, update the regression baseline after verification.

Unexpected Changes

These are changes you did not intend:

unrelated methods started failing
extra downstream HTTP calls appeared
SQL count jumped for no business reason
response payload drifted outside the planned scope
a previously green path broke outside the changed area

For unexpected changes, fix the code. Do not refresh the suite.

Decision Table

Use this table to choose the right BitDive operation.

Situation	Recommended Operation	Why
New service, no baseline yet	`auto_generate_tests_for_service`	Create a fresh suite from the latest good traces
Many methods changed intentionally across a service	`update_existing_test_group`	Bulk refresh the suite to the new verified baseline
Only failed methods now reflect intended behavior	`update_failed_tests_in_group`	Refresh only the failed entries instead of rewriting the whole group
One specific method should be updated surgically	`replace_test_with_latest_trace`	Smallest safe change to the baseline
You need to understand why the suite failed before deciding	`get_test_failure_details`	Inspect the failure first, then choose an update strategy
You need to inspect the structure of the current suite	`get_script_data`	See what is actually inside the test group

Recommended Workflow

1. Inspect the Failure

Start by understanding the current red state:

use get_test_failure_details to see which methods failed
use get_script_data if you need to inspect the full structure
compare before and after traces for the changed behavior

Do not jump straight to updating.

2. Classify the Failure

Ask:

Was this behavior intentionally changed?
Is the diff limited to the area I touched?
Is the runtime difference explainable?

If the answer is no, treat it as a real regression.

3. Choose the Smallest Valid Update

Use the narrowest update scope that matches reality:

one entry changed: replace one entry
a handful of expected failures: update failed entries
broad verified change: refresh the whole group
no baseline exists: create a new group

4. Run the Suite Again

After the update, rerun:

mvn test

The suite should now reflect the new intended baseline, not a mixture of old and unexplained behavior.

The Main Operations

1. Create a New Baseline

Use auto_generate_tests_for_service when:

a service does not yet have a useful regression suite
you are onboarding BitDive for a new service
you intentionally want a clean, current baseline from recent successful traces

This creates a new test group from the latest good executions across the service.

Use it when you need a starting point, not when you only need to repair one red method.

2. Refresh an Existing Test Group

Use update_existing_test_group when:

a broad, intentional change touched many methods
a service baseline legitimately moved
you already verified the new runtime behavior and want the suite to follow it

Typical examples:

major refactor with preserved business intent but different internal behavior
API layer update affecting many methods consistently
infrastructure change that intentionally changes serialized output across multiple traces

This is a bulk operation. Use it only when the change really is broad.

3. Update Only Failed Entries

Use update_failed_tests_in_group when:

only the currently failing methods should move to the new baseline
most of the suite is still valid
you want a quick, targeted refresh after intended changes

This is often the safest bulk-ish option after a moderate change.

It avoids rewriting the whole suite while still saving time versus one-by-one replacements.

4. Replace One Method Surgically

Use replace_test_with_latest_trace when:

exactly one method entry is outdated
one bug fix intentionally changed one path
you want minimal blast radius

This is the best option when you know precisely which method should move and nothing else should.

Safe Operational Patterns

Pattern A: Bug Fix with One Expected Change

inspect the failing method
compare before and after traces
confirm the fix changed only the intended path
use replace_test_with_latest_trace
rerun the suite

Pattern B: Refactor with Several Expected Failures

inspect the failure set
confirm the new behavior is intended
use update_failed_tests_in_group
rerun the suite

Pattern C: Large Verified Baseline Shift

verify behavior across the changed service
confirm that the new broad diff is expected
use update_existing_test_group
rerun the suite and compare to the previous baseline

Pattern D: Unexpected Downstream Regression

inspect the failure details
compare traces
identify the unrelated or unexplained behavior drift
fix the code
do not update the baseline yet

Anti-Patterns

Avoid these habits:

refreshing the whole suite because one test failed
updating failed methods before looking at trace comparison
blessing regressions outside the changed scope
using baseline refresh as a substitute for root-cause analysis
creating a brand new suite when a surgical replacement would do

BitDive is most valuable when the regression memory stays trustworthy.

Example MCP Sequences

Understand a failing suite

get_all_test_scripts
get_test_failure_details
find_trace_summary
compare_traces

Repair only the intended failures

get_test_failure_details
verify each failure as expected
update_failed_tests_in_group
rerun mvn test

Replace one method entry

get_test_failure_details
identify the specific script_data_test_id
replace_test_with_latest_trace
rerun mvn test

Refresh the whole baseline

verify the broad change with trace comparison
update_existing_test_group
rerun mvn test

Relationship to the Developer Workflow

Regression management is the fourth stage of the standard BitDive change cycle:

establish the behavioral baseline
make a focused code change
verify with before and after traces
refresh only the intended regression memory

If you skip stage 3 and jump straight to stage 4, you are no longer managing regression baselines. You are just hiding failures.

Regression management is where BitDive stops being a one-time test generator and becomes long-lived system memory.

The Core Rule​

What You Are Managing​

Expected vs Unexpected Changes​

Expected Changes​

Unexpected Changes​

Decision Table​

Recommended Workflow​

1. Inspect the Failure​

2. Classify the Failure​

3. Choose the Smallest Valid Update​

4. Run the Suite Again​

The Main Operations​

1. Create a New Baseline​

2. Refresh an Existing Test Group​

3. Update Only Failed Entries​

4. Replace One Method Surgically​

Safe Operational Patterns​

Pattern A: Bug Fix with One Expected Change​

Pattern B: Refactor with Several Expected Failures​

Pattern C: Large Verified Baseline Shift​

Pattern D: Unexpected Downstream Regression​

Anti-Patterns​

Example MCP Sequences​

Understand a failing suite​

Repair only the intended failures​

Replace one method entry​

Refresh the whole baseline​

Relationship to the Developer Workflow​

Related Guides​