Skip to content

fix(verify): tolerate NTFS live-read skew in verify_parity (live mode only)#487

Merged
githubrobbi merged 2 commits into
mainfrom
fix/verify-parity-live-skew
Jun 28, 2026
Merged

fix(verify): tolerate NTFS live-read skew in verify_parity (live mode only)#487
githubrobbi merged 2 commits into
mainfrom
fix/verify-parity-live-skew

Conversation

@githubrobbi

Copy link
Copy Markdown
Collaborator

Stop verify_parity from crying wolf on live drives

Live parity (--live) reads each volume twice — C++ (warm) then Rust
(cold), seconds apart — on a volume that keeps mutating in between. So a
directory's accessed timestamp and its subtree size/count aggregates,
and anything under an NTFS/OS-volatile path, legitimately differ between
the two reads. The classifier counted any such diff as a MISMATCH, flipping
clean SUPERSET drives (E/G/S) to red on a single benign row — and M
self-corrected on a re-run
, the tell-tale of read-time skew rather than a
parser bug. Offline parity (frozen capture) was always clean.

Fix (live mode only)

An NTFS-aware reconciliation step pairs the C++/Rust diff by path and
tolerates pairs whose only differences are read-time skew:

  • the accessed timestamp (directory atime bumps on traversal),
  • for a directory, the subtree size/count aggregates (derived from
    children that may have changed in the gap), and
  • any NTFS/OS-volatile path: $Extend/$UsnJrnl/$LogFile/$MFT,
    $RECYCLE.BIN, System Volume Information, paging files, AV sandboxes,
    and this harness's own scratch dirs.

Tolerated diffs are reported with their reason (no silent hiding);
hardlink-only paths are untouched, so a genuine superset stays a superset.

Deliberately scoped

Offline (frozen capture) does not reconcile — there a diff is a real
parser disagreement, so the known C: / F: issues still surface unchanged.

Verified against the real failing rows from the live run, plus negative
cases: a real file-size or directory-flags diff is not tolerated.

🤖 Generated with Claude Code

… only)

Live parity compares two independent reads of a mutating volume (C++ warm,
then Rust cold), so a directory's accessed timestamp and its subtree
size/count aggregates — and anything under an NTFS/OS-volatile path —
legitimately differ between the two reads. That flipped clean SUPERSET
drives (E/G/S) to MISMATCH on a single benign diff, while M self-corrected
on re-run: the tell-tale of read-time skew, not a parser bug.

Add an NTFS-aware reconciliation step (LIVE MODE ONLY) that pairs the
C++/Rust diff by path and tolerates pairs whose only differences are:
  * the accessed timestamp (directory atime bumps on traversal),
  * for a directory, the subtree size/count aggregates, and
  * any NTFS/OS-volatile path ($Extend / $UsnJrnl / $LogFile / $MFT,
    $RECYCLE.BIN, System Volume Information, pagefile/hiberfil/swapfile, AV
    sandboxes, and this harness's own scratch dirs).

Tolerated diffs are reported with their reason (no hiding); hardlink-only
paths are untouched, so a genuine superset stays a superset.

Offline (frozen capture) deliberately does NOT reconcile: there a diff is a
real parser disagreement, so the known C: / F: issues still surface.

Verified against the real failing rows from a live run plus negative cases
(a real file-size or directory-flags diff is NOT tolerated).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@githubrobbi githubrobbi enabled auto-merge June 28, 2026 01:22
@githubrobbi githubrobbi added this pull request to the merge queue Jun 28, 2026
Merged via the queue into main with commit 52ea8b9 Jun 28, 2026
27 checks passed
@githubrobbi githubrobbi deleted the fix/verify-parity-live-skew branch June 28, 2026 01:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant