Skip to content

internal/planner: three documented invariants are unguarded by tests #392

@bdchatham

Description

@bdchatham

Problem

Three internal/planner invariants are documented in doc.go (sei-k8s-controller#391) and listed in CLAUDE.md Key Patterns, but no test directly guards them — a regression of any of the three compiles, lints, and passes CI today:

  1. Atomic plan creation (persist-before-execute ordering). TestReconcile_CreatesPlanOnFirstRun asserts the plan is persisted and Active, but does not assert zero task submissions on the creating reconcile. Its comment even says "and executes the first task," which contradicts the planAlreadyActive gate (the creating reconcile requeues without executing). The ordering — plan visible before any side effect — is unverified.
  2. Single-patch model. Nothing asserts how many Status().Patch calls occur per reconcile path. A second status write per reconcile is legal Go and would pass; the single-snapshot/single-flush invariant is unverified.
  3. FailedPhase == "" means retry, not terminal. node_update_test.go asserts a NodeUpdate plan's FailedPhase is empty, but no test asserts node.Status.Phase remains Running after a NodeUpdate task fails terminally — the actual retry-not-terminal behavior is unverified.

Impact

These are correctness/idempotency invariants: plan-creation atomicity (external observers must see the plan before side effects), single-patch staleness safety (paired with the optimistic-lock rule), and NodeUpdate retry semantics. They are load-bearing per CLAUDE.md, but a silent regression would not be caught by the suite.

Relevant experts

  • kubernetes-specialist — controller-runtime test patterns, envtest harness.
  • idiomatic-reviewer — surfaced these during the /idiomatic pilot.

Proposed approach

  • Atomic creation: assert len(submitted) == 0 on the creating reconcile in TestReconcile_CreatesPlanOnFirstRun, and fix the misleading comment.
  • Single-patch: assert exactly one Status().Patch per reconcile path (a counting fake client, or a reconcile-level assertion).
  • Retry semantics: assert the node stays Running after a NodeUpdate task fails terminally.

Out of scope

  • The doc.go documentation change itself (sei-k8s-controller#391).
  • A stale-write race test for the optimistic-lock invariant — needs an envtest concurrency harness; separate and harder.

References

  • Surfaced by the Tide /idiomatic expert pilot: sei-protocol/Tide#126
  • Invariants documented in: sei-k8s-controller#391

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions