Skip to content

Multi-checkpoint step (multiple ANALYZE + combining ASSERT) authors green but fails deterministically on replay with null assert values #85

@komal-lt

Description

@komal-lt

Summary

A testmd step that emits multiple ANALYZE checkpoints plus a combining ASSERT (i.e. the step's prose asks the agent to verify more than one condition) authors green but fails deterministically on replay. On the replay run, every ANALYZE sub-checkpoint passes, but the final assert step fails with all of its checkpoint values (expected, extracted_value, actual) equal to null. The page is fine — nothing actually regressed.

This breaks the core value of saved tests: "author once, replay cheap and deterministic." Any multi-condition step is a latent replay failure.

Environment

  • @testmuai/kane-cli 0.4.1 (npm global, macOS / node v20.18.0)
  • mode: testing, prod, profile-authenticated browser

Repro

A single step whose prose asks for two things + a fail condition, e.g.:

## Open the app and verify the signed-in identity
Open {{TM_BASE_URL}}/projects. Wait for the projects view to load.
Verify the top header shows a signed-in user's profile avatar / account menu,
AND that no email/password login form is present. The test MUST FAIL if no
profile avatar is present, or if a login form is shown instead.

kane-cli authors this as: analyze (avatar present) → analyze (login form not present) → combining assert.

  1. kane-cli testmd run <file> --authorPASSES (author_decisions:1).
  2. kane-cli testmd run <file>FAILS (replay_decisions:1), every run, deterministically.

Expected

Replay reproduces the authored result. If the sub-analyze checkpoints pass, the combining assert should pass.

Actual

run_summary.json (replay run):

final_status: failed | reason: assertion_failed: @ step 5
 step 1 navigate => passed
 step 2 wait     => passed
 step 3 analyze  => passed   (avatar present = true)
 step 4 analyze  => passed   (login form not shown = true)
 step 5 assert   => failed   <-- combining assert

The step-5 assert's checkpoints carry no evaluation data on replay:

[
  { "operator": "visual", "expected": null, "extracted_value": null, "actual": null },
  { "operator": "visual", "expected": null, "extracted_value": null, "actual": null }
]

So the assert isn't re-evaluated against the (passing) analyze results on replay — it defaults to failed.

Workaround (verified)

Collapse each step to a single positive assertion (one ANALYZE/check, no combining ASSERT over multiple conditions). Reworded that way, the same test replays green consistently (verified 2×). Steps that were already single-checkpoint replay fine.

Impact

  • Multi-condition steps silently pass in authoring/CI-first-run and then fail on every cheap replay, so a green author run is not a reliable signal that the saved test will replay.
  • Forces authors to hand-split every assertion into its own step to get deterministic replays.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions