Summary
A testmd step that emits multiple ANALYZE checkpoints plus a combining ASSERT (i.e. the step's prose asks the agent to verify more than one condition) authors green but fails deterministically on replay. On the replay run, every ANALYZE sub-checkpoint passes, but the final assert step fails with all of its checkpoint values (expected, extracted_value, actual) equal to null. The page is fine — nothing actually regressed.
This breaks the core value of saved tests: "author once, replay cheap and deterministic." Any multi-condition step is a latent replay failure.
Environment
@testmuai/kane-cli 0.4.1 (npm global, macOS / node v20.18.0)
mode: testing, prod, profile-authenticated browser
Repro
A single step whose prose asks for two things + a fail condition, e.g.:
## Open the app and verify the signed-in identity
Open {{TM_BASE_URL}}/projects. Wait for the projects view to load.
Verify the top header shows a signed-in user's profile avatar / account menu,
AND that no email/password login form is present. The test MUST FAIL if no
profile avatar is present, or if a login form is shown instead.
kane-cli authors this as: analyze (avatar present) → analyze (login form not present) → combining assert.
kane-cli testmd run <file> --author → PASSES (author_decisions:1).
kane-cli testmd run <file> → FAILS (replay_decisions:1), every run, deterministically.
Expected
Replay reproduces the authored result. If the sub-analyze checkpoints pass, the combining assert should pass.
Actual
run_summary.json (replay run):
final_status: failed | reason: assertion_failed: @ step 5
step 1 navigate => passed
step 2 wait => passed
step 3 analyze => passed (avatar present = true)
step 4 analyze => passed (login form not shown = true)
step 5 assert => failed <-- combining assert
The step-5 assert's checkpoints carry no evaluation data on replay:
[
{ "operator": "visual", "expected": null, "extracted_value": null, "actual": null },
{ "operator": "visual", "expected": null, "extracted_value": null, "actual": null }
]
So the assert isn't re-evaluated against the (passing) analyze results on replay — it defaults to failed.
Workaround (verified)
Collapse each step to a single positive assertion (one ANALYZE/check, no combining ASSERT over multiple conditions). Reworded that way, the same test replays green consistently (verified 2×). Steps that were already single-checkpoint replay fine.
Impact
- Multi-condition steps silently pass in authoring/CI-first-run and then fail on every cheap replay, so a green author run is not a reliable signal that the saved test will replay.
- Forces authors to hand-split every assertion into its own step to get deterministic replays.
Summary
A
testmdstep that emits multipleANALYZEcheckpoints plus a combiningASSERT(i.e. the step's prose asks the agent to verify more than one condition) authors green but fails deterministically on replay. On the replay run, everyANALYZEsub-checkpoint passes, but the finalassertstep fails with all of its checkpoint values (expected,extracted_value,actual) equal tonull. The page is fine — nothing actually regressed.This breaks the core value of saved tests: "author once, replay cheap and deterministic." Any multi-condition step is a latent replay failure.
Environment
@testmuai/kane-cli0.4.1 (npm global, macOS / node v20.18.0)mode: testing, prod, profile-authenticated browserRepro
A single step whose prose asks for two things + a fail condition, e.g.:
## Open the app and verify the signed-in identity Open {{TM_BASE_URL}}/projects. Wait for the projects view to load. Verify the top header shows a signed-in user's profile avatar / account menu, AND that no email/password login form is present. The test MUST FAIL if no profile avatar is present, or if a login form is shown instead.kane-cli authors this as:
analyze(avatar present) →analyze(login form not present) → combiningassert.kane-cli testmd run <file> --author→ PASSES (author_decisions:1).kane-cli testmd run <file>→ FAILS (replay_decisions:1), every run, deterministically.Expected
Replay reproduces the authored result. If the sub-
analyzecheckpoints pass, the combiningassertshould pass.Actual
run_summary.json(replay run):The step-5 assert's checkpoints carry no evaluation data on replay:
[ { "operator": "visual", "expected": null, "extracted_value": null, "actual": null }, { "operator": "visual", "expected": null, "extracted_value": null, "actual": null } ]So the assert isn't re-evaluated against the (passing) analyze results on replay — it defaults to failed.
Workaround (verified)
Collapse each step to a single positive assertion (one
ANALYZE/check, no combiningASSERTover multiple conditions). Reworded that way, the same test replays green consistently (verified 2×). Steps that were already single-checkpoint replay fine.Impact