fix(verify): tolerate NTFS live-read skew in verify_parity (live mode only)#487
Merged
Conversation
… only)
Live parity compares two independent reads of a mutating volume (C++ warm,
then Rust cold), so a directory's accessed timestamp and its subtree
size/count aggregates — and anything under an NTFS/OS-volatile path —
legitimately differ between the two reads. That flipped clean SUPERSET
drives (E/G/S) to MISMATCH on a single benign diff, while M self-corrected
on re-run: the tell-tale of read-time skew, not a parser bug.
Add an NTFS-aware reconciliation step (LIVE MODE ONLY) that pairs the
C++/Rust diff by path and tolerates pairs whose only differences are:
* the accessed timestamp (directory atime bumps on traversal),
* for a directory, the subtree size/count aggregates, and
* any NTFS/OS-volatile path ($Extend / $UsnJrnl / $LogFile / $MFT,
$RECYCLE.BIN, System Volume Information, pagefile/hiberfil/swapfile, AV
sandboxes, and this harness's own scratch dirs).
Tolerated diffs are reported with their reason (no hiding); hardlink-only
paths are untouched, so a genuine superset stays a superset.
Offline (frozen capture) deliberately does NOT reconcile: there a diff is a
real parser disagreement, so the known C: / F: issues still surface.
Verified against the real failing rows from a live run plus negative cases
(a real file-size or directory-flags diff is NOT tolerated).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stop
verify_parityfrom crying wolf on live drivesLive parity (
--live) reads each volume twice — C++ (warm) then Rust(cold), seconds apart — on a volume that keeps mutating in between. So a
directory's accessed timestamp and its subtree size/count aggregates,
and anything under an NTFS/OS-volatile path, legitimately differ between
the two reads. The classifier counted any such diff as a
MISMATCH, flippingclean SUPERSET drives (E/G/S) to red on a single benign row — and M
self-corrected on a re-run, the tell-tale of read-time skew rather than a
parser bug. Offline parity (frozen capture) was always clean.
Fix (live mode only)
An NTFS-aware reconciliation step pairs the C++/Rust diff by path and
tolerates pairs whose only differences are read-time skew:
children that may have changed in the gap), and
$Extend/$UsnJrnl/$LogFile/$MFT,$RECYCLE.BIN,System Volume Information, paging files, AV sandboxes,and this harness's own scratch dirs.
Tolerated diffs are reported with their reason (no silent hiding);
hardlink-only paths are untouched, so a genuine superset stays a superset.
Deliberately scoped
Offline (frozen capture) does not reconcile — there a diff is a real
parser disagreement, so the known C: / F: issues still surface unchanged.
Verified against the real failing rows from the live run, plus negative
cases: a real file-size or directory-flags diff is not tolerated.
🤖 Generated with Claude Code