Multi-checkpoint step (multiple ANALYZE + combining ASSERT) authors green but fails deterministically on replay with null assert values

## Summary

A `testmd` step that emits **multiple `ANALYZE` checkpoints plus a combining `ASSERT`** (i.e. the step's prose asks the agent to verify more than one condition) **authors green but fails *deterministically* on replay.** On the replay run, every `ANALYZE` sub-checkpoint passes, but the final `assert` step fails with all of its checkpoint values (`expected`, `extracted_value`, `actual`) equal to `null`. The page is fine — nothing actually regressed.

This breaks the core value of saved tests: "author once, replay cheap and deterministic." Any multi-condition step is a latent replay failure.

## Environment

- `@testmuai/kane-cli` **0.4.1** (npm global, macOS / node v20.18.0)
- `mode: testing`, prod, profile-authenticated browser

## Repro

A single step whose prose asks for two things + a fail condition, e.g.:

```markdown
## Open the app and verify the signed-in identity
Open {{TM_BASE_URL}}/projects. Wait for the projects view to load.
Verify the top header shows a signed-in user's profile avatar / account menu,
AND that no email/password login form is present. The test MUST FAIL if no
profile avatar is present, or if a login form is shown instead.
```

kane-cli authors this as: `analyze` (avatar present) → `analyze` (login form not present) → combining `assert`.

1. `kane-cli testmd run <file> --author`  → **PASSES** (author_decisions:1).
2. `kane-cli testmd run <file>`            → **FAILS** (replay_decisions:1), every run, deterministically.

## Expected

Replay reproduces the authored result. If the sub-`analyze` checkpoints pass, the combining `assert` should pass.

## Actual

`run_summary.json` (replay run):

```
final_status: failed | reason: assertion_failed: @ step 5
 step 1 navigate => passed
 step 2 wait     => passed
 step 3 analyze  => passed   (avatar present = true)
 step 4 analyze  => passed   (login form not shown = true)
 step 5 assert   => failed   <-- combining assert
```

The step-5 assert's checkpoints carry no evaluation data on replay:

```json
[
  { "operator": "visual", "expected": null, "extracted_value": null, "actual": null },
  { "operator": "visual", "expected": null, "extracted_value": null, "actual": null }
]
```

So the assert isn't re-evaluated against the (passing) analyze results on replay — it defaults to failed.

## Workaround (verified)

Collapse each step to a **single positive assertion** (one `ANALYZE`/check, no combining `ASSERT` over multiple conditions). Reworded that way, the same test replays green consistently (verified 2×). Steps that were already single-checkpoint replay fine.

## Impact

- Multi-condition steps silently pass in authoring/CI-first-run and then fail on every cheap replay, so a green author run is not a reliable signal that the saved test will replay.
- Forces authors to hand-split every assertion into its own step to get deterministic replays.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-checkpoint step (multiple ANALYZE + combining ASSERT) authors green but fails deterministically on replay with null assert values #85

Summary

Environment

Repro

Expected

Actual

Workaround (verified)

Impact

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Multi-checkpoint step (multiple ANALYZE + combining ASSERT) authors green but fails deterministically on replay with null assert values #85

Description

Summary

Environment

Repro

Expected

Actual

Workaround (verified)

Impact

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions