Regression Management in BitDive
Creating tests from traces is only half of the job. The other half is keeping those regression suites aligned with real, intended system behavior over time.
This page explains how to operate BitDive regression suites safely:
- when to create a new suite
- when to refresh an existing one
- when to update only failed entries
- when to replace one specific method surgically
- when not to update anything because the failure is a real regression
The Core Rule
Verify first. Update second.
Never refresh a BitDive baseline just because a test turned red.
Before updating any regression suite, confirm:
- the new runtime behavior is intended
- the before and after trace difference is explainable
- the wider test suite still makes sense in context
If you cannot explain the diff, do not bless it.
What You Are Managing
BitDive regression management revolves around a few objects:
- trace /
call_id: one recorded real execution - test group / test script: a saved suite built from traces
- test entry / method entry: one method or scenario inside that group
Operationally, BitDive lets you manage regression memory at three levels:
- create a new suite
- refresh a whole existing suite
- replace only the failing or changed part
The safest choice is usually the smallest one that matches reality.
Expected vs Unexpected Changes
This is the most important classification in daily work.
Expected Changes
These are changes you intended and verified:
- bug fix changed the error path
- refactor preserved output but reduced SQL calls
- DTO change intentionally modified the response contract
- infrastructure change legitimately changed serialization format
For expected changes, update the regression baseline after verification.
Unexpected Changes
These are changes you did not intend:
- unrelated methods started failing
- extra downstream HTTP calls appeared
- SQL count jumped for no business reason
- response payload drifted outside the planned scope
- a previously green path broke outside the changed area
For unexpected changes, fix the code. Do not refresh the suite.
Decision Table
Use this table to choose the right BitDive operation.
| Situation | Recommended Operation | Why |
|---|---|---|
| New service, no baseline yet | auto_generate_tests_for_service | Create a fresh suite from the latest good traces |
| Many methods changed intentionally across a service | update_existing_test_group | Bulk refresh the suite to the new verified baseline |
| Only failed methods now reflect intended behavior | update_failed_tests_in_group | Refresh only the failed entries instead of rewriting the whole group |
| One specific method should be updated surgically | replace_test_with_latest_trace | Smallest safe change to the baseline |
| You need to understand why the suite failed before deciding | get_test_failure_details | Inspect the failure first, then choose an update strategy |
| You need to inspect the structure of the current suite | get_script_data | See what is actually inside the test group |
Recommended Workflow
1. Inspect the Failure
Start by understanding the current red state:
- use
get_test_failure_detailsto see which methods failed - use
get_script_dataif you need to inspect the full structure - compare before and after traces for the changed behavior
Do not jump straight to updating.
2. Classify the Failure
Ask:
- Was this behavior intentionally changed?
- Is the diff limited to the area I touched?
- Is the runtime difference explainable?
If the answer is no, treat it as a real regression.
3. Choose the Smallest Valid Update
Use the narrowest update scope that matches reality:
- one entry changed: replace one entry
- a handful of expected failures: update failed entries
- broad verified change: refresh the whole group
- no baseline exists: create a new group
4. Run the Suite Again
After the update, rerun:
mvn test
The suite should now reflect the new intended baseline, not a mixture of old and unexplained behavior.
The Main Operations
1. Create a New Baseline
Use auto_generate_tests_for_service when:
- a service does not yet have a useful regression suite
- you are onboarding BitDive for a new service
- you intentionally want a clean, current baseline from recent successful traces
This creates a new test group from the latest good executions across the service.
Use it when you need a starting point, not when you only need to repair one red method.
2. Refresh an Existing Test Group
Use update_existing_test_group when:
- a broad, intentional change touched many methods
- a service baseline legitimately moved
- you already verified the new runtime behavior and want the suite to follow it
Typical examples:
- major refactor with preserved business intent but different internal behavior
- API layer update affecting many methods consistently
- infrastructure change that intentionally changes serialized output across multiple traces
This is a bulk operation. Use it only when the change really is broad.
3. Update Only Failed Entries
Use update_failed_tests_in_group when:
- only the currently failing methods should move to the new baseline
- most of the suite is still valid
- you want a quick, targeted refresh after intended changes
This is often the safest bulk-ish option after a moderate change.
It avoids rewriting the whole suite while still saving time versus one-by-one replacements.
4. Replace One Method Surgically
Use replace_test_with_latest_trace when:
- exactly one method entry is outdated
- one bug fix intentionally changed one path
- you want minimal blast radius
This is the best option when you know precisely which method should move and nothing else should.
Safe Operational Patterns
Pattern A: Bug Fix with One Expected Change
- inspect the failing method
- compare before and after traces
- confirm the fix changed only the intended path
- use
replace_test_with_latest_trace - rerun the suite
Pattern B: Refactor with Several Expected Failures
- inspect the failure set
- confirm the new behavior is intended
- use
update_failed_tests_in_group - rerun the suite
Pattern C: Large Verified Baseline Shift
- verify behavior across the changed service
- confirm that the new broad diff is expected
- use
update_existing_test_group - rerun the suite and compare to the previous baseline
Pattern D: Unexpected Downstream Regression
- inspect the failure details
- compare traces
- identify the unrelated or unexplained behavior drift
- fix the code
- do not update the baseline yet
Anti-Patterns
Avoid these habits:
- refreshing the whole suite because one test failed
- updating failed methods before looking at trace comparison
- blessing regressions outside the changed scope
- using baseline refresh as a substitute for root-cause analysis
- creating a brand new suite when a surgical replacement would do
BitDive is most valuable when the regression memory stays trustworthy.
Example MCP Sequences
Understand a failing suite
get_all_test_scriptsget_test_failure_detailsfind_trace_summarycompare_traces
Repair only the intended failures
get_test_failure_details- verify each failure as expected
update_failed_tests_in_group- rerun
mvn test
Replace one method entry
get_test_failure_details- identify the specific
script_data_test_id replace_test_with_latest_trace- rerun
mvn test
Refresh the whole baseline
- verify the broad change with trace comparison
update_existing_test_group- rerun
mvn test
Relationship to the Developer Workflow
Regression management is the fourth stage of the standard BitDive change cycle:
- establish the behavioral baseline
- make a focused code change
- verify with before and after traces
- refresh only the intended regression memory
If you skip stage 3 and jump straight to stage 4, you are no longer managing regression baselines. You are just hiding failures.
Related Guides
- The BitDive Developer Workflow
- BitDive MCP Tools Reference
- Autonomous Quality Loop for AI Agents
- CI/CD Integration
Regression management is where BitDive stops being a one-time test generator and becomes long-lived system memory.