Skip to main content

Regression Management in BitDive

Creating tests from traces is only half of the job. The other half is keeping those regression suites aligned with real, intended system behavior over time.

This page explains how to operate BitDive regression suites safely:

  • when to create a new suite
  • when to refresh an existing one
  • when to update only failed entries
  • when to replace one specific method surgically
  • when not to update anything because the failure is a real regression

The Core Rule

Verify first. Update second.

Never refresh a BitDive baseline just because a test turned red.

Before updating any regression suite, confirm:

  • the new runtime behavior is intended
  • the before and after trace difference is explainable
  • the wider test suite still makes sense in context

If you cannot explain the diff, do not bless it.


What You Are Managing

BitDive regression management revolves around a few objects:

  • trace / call_id: one recorded real execution
  • test group / test script: a saved suite built from traces
  • test entry / method entry: one method or scenario inside that group

Operationally, BitDive lets you manage regression memory at three levels:

  1. create a new suite
  2. refresh a whole existing suite
  3. replace only the failing or changed part

The safest choice is usually the smallest one that matches reality.


Expected vs Unexpected Changes

This is the most important classification in daily work.

Expected Changes

These are changes you intended and verified:

  • bug fix changed the error path
  • refactor preserved output but reduced SQL calls
  • DTO change intentionally modified the response contract
  • infrastructure change legitimately changed serialization format

For expected changes, update the regression baseline after verification.

Unexpected Changes

These are changes you did not intend:

  • unrelated methods started failing
  • extra downstream HTTP calls appeared
  • SQL count jumped for no business reason
  • response payload drifted outside the planned scope
  • a previously green path broke outside the changed area

For unexpected changes, fix the code. Do not refresh the suite.


Decision Table

Use this table to choose the right BitDive operation.

SituationRecommended OperationWhy
New service, no baseline yetauto_generate_tests_for_serviceCreate a fresh suite from the latest good traces
Many methods changed intentionally across a serviceupdate_existing_test_groupBulk refresh the suite to the new verified baseline
Only failed methods now reflect intended behaviorupdate_failed_tests_in_groupRefresh only the failed entries instead of rewriting the whole group
One specific method should be updated surgicallyreplace_test_with_latest_traceSmallest safe change to the baseline
You need to understand why the suite failed before decidingget_test_failure_detailsInspect the failure first, then choose an update strategy
You need to inspect the structure of the current suiteget_script_dataSee what is actually inside the test group

1. Inspect the Failure

Start by understanding the current red state:

  • use get_test_failure_details to see which methods failed
  • use get_script_data if you need to inspect the full structure
  • compare before and after traces for the changed behavior

Do not jump straight to updating.

2. Classify the Failure

Ask:

  • Was this behavior intentionally changed?
  • Is the diff limited to the area I touched?
  • Is the runtime difference explainable?

If the answer is no, treat it as a real regression.

3. Choose the Smallest Valid Update

Use the narrowest update scope that matches reality:

  • one entry changed: replace one entry
  • a handful of expected failures: update failed entries
  • broad verified change: refresh the whole group
  • no baseline exists: create a new group

4. Run the Suite Again

After the update, rerun:

mvn test

The suite should now reflect the new intended baseline, not a mixture of old and unexplained behavior.


The Main Operations

1. Create a New Baseline

Use auto_generate_tests_for_service when:

  • a service does not yet have a useful regression suite
  • you are onboarding BitDive for a new service
  • you intentionally want a clean, current baseline from recent successful traces

This creates a new test group from the latest good executions across the service.

Use it when you need a starting point, not when you only need to repair one red method.

2. Refresh an Existing Test Group

Use update_existing_test_group when:

  • a broad, intentional change touched many methods
  • a service baseline legitimately moved
  • you already verified the new runtime behavior and want the suite to follow it

Typical examples:

  • major refactor with preserved business intent but different internal behavior
  • API layer update affecting many methods consistently
  • infrastructure change that intentionally changes serialized output across multiple traces

This is a bulk operation. Use it only when the change really is broad.

3. Update Only Failed Entries

Use update_failed_tests_in_group when:

  • only the currently failing methods should move to the new baseline
  • most of the suite is still valid
  • you want a quick, targeted refresh after intended changes

This is often the safest bulk-ish option after a moderate change.

It avoids rewriting the whole suite while still saving time versus one-by-one replacements.

4. Replace One Method Surgically

Use replace_test_with_latest_trace when:

  • exactly one method entry is outdated
  • one bug fix intentionally changed one path
  • you want minimal blast radius

This is the best option when you know precisely which method should move and nothing else should.


Safe Operational Patterns

Pattern A: Bug Fix with One Expected Change

  1. inspect the failing method
  2. compare before and after traces
  3. confirm the fix changed only the intended path
  4. use replace_test_with_latest_trace
  5. rerun the suite

Pattern B: Refactor with Several Expected Failures

  1. inspect the failure set
  2. confirm the new behavior is intended
  3. use update_failed_tests_in_group
  4. rerun the suite

Pattern C: Large Verified Baseline Shift

  1. verify behavior across the changed service
  2. confirm that the new broad diff is expected
  3. use update_existing_test_group
  4. rerun the suite and compare to the previous baseline

Pattern D: Unexpected Downstream Regression

  1. inspect the failure details
  2. compare traces
  3. identify the unrelated or unexplained behavior drift
  4. fix the code
  5. do not update the baseline yet

Anti-Patterns

Avoid these habits:

  • refreshing the whole suite because one test failed
  • updating failed methods before looking at trace comparison
  • blessing regressions outside the changed scope
  • using baseline refresh as a substitute for root-cause analysis
  • creating a brand new suite when a surgical replacement would do

BitDive is most valuable when the regression memory stays trustworthy.


Example MCP Sequences

Understand a failing suite

  1. get_all_test_scripts
  2. get_test_failure_details
  3. find_trace_summary
  4. compare_traces

Repair only the intended failures

  1. get_test_failure_details
  2. verify each failure as expected
  3. update_failed_tests_in_group
  4. rerun mvn test

Replace one method entry

  1. get_test_failure_details
  2. identify the specific script_data_test_id
  3. replace_test_with_latest_trace
  4. rerun mvn test

Refresh the whole baseline

  1. verify the broad change with trace comparison
  2. update_existing_test_group
  3. rerun mvn test

Relationship to the Developer Workflow

Regression management is the fourth stage of the standard BitDive change cycle:

  1. establish the behavioral baseline
  2. make a focused code change
  3. verify with before and after traces
  4. refresh only the intended regression memory

If you skip stage 3 and jump straight to stage 4, you are no longer managing regression baselines. You are just hiding failures.


Regression management is where BitDive stops being a one-time test generator and becomes long-lived system memory.