Trace cmp substrate by twof · Pull Request #54 · doordash-oss/PropertyTestingKit

twof · 2026-06-19T00:28:15Z

No description provided.

Mutator.mutate becomes (Value, inout FastRNG) -> Value: one mutant per call, variety from the RNG. Effort per seed (burst size, stacking) now belongs to the engine, not the mutator: selectForMutation queues a fixed burst (mutationBurstLength = 16) of single-step mutants, each mutating one randomly chosen pack position via mutateOnePosition. Built-in conformances and specialty mutators keep their candidate enumerations and pick one per call; compose picks a random component; ScheduleByteMutator picks one strategy per call. Test fixtures that asserted exhaustive enumeration now assert membership/coverage over draws (FastRNG is thread-local and unseedable). Groundwork for the pool scheduler (focus + counter): a selection becomes a unit of requested work, so burst size and mutation depth can become scheduler knobs. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Mutation scheduling moves off the plugin bus into a per-engine scheduler component: fuzz(scheduler: .weightedPool()) owns the pool of interesting inputs, draws with focus+counter bursts (one fresh generation between bursts), and composes policy inside it — PoolAdmission decides membership, child PoolPlugins advise weights/evictions via owner-mediated actions and hear every membership change (.inserted/ .removed re-broadcast). Children are non-generic: events carry entry IDs and coverage, never typed inputs, so policies work under any input pack and schedule fuzzing. The engine consults the scheduler only when the residual queue (seeds, queueInputs, bus bursts) is empty; queue semantics are unchanged, so stopWhenQueueEmpty replay and selectForMutation lineage still hold. Pool mutants report a new IterationContext.poolParentID — a separate namespace from parentID (bus-plugin originIDs) on purpose. corpusMutation and energyMutation are deleted: .weightedPool() with .everyDiscovery admission and focusOnInsert is the corpusMutation loop, and the Entropic scoring math (kept, with its characterization tests) becomes a pool weight advisor next. Default plugins are now empty. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

PoolAdmission.featureOwnership — libFuzzer's corpus model: an accepted input joins the pool only by owning >= 1 coverage feature (unowned, or stolen from a strictly larger owner; covered-edge count is the REDUCE metric, ties don't steal). An entry losing its last feature is evicted through the same removal path as child evictions, so every policy hears it. Bounds the mutation pool by the feature space regardless of the coverage strategy's acceptance rate; rejected accepts get no burst and no residence (strict semantics). Admission verdicts can now carry evictions (PoolAdmission.Verdict); .everyDiscovery is unchanged in behavior. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

EntropicWeightPolicy (PoolPlugin): yield/executions keyed by pool entry ID off the event stream — .iteration with a pool parent attributes executions and credits discovered features to the parent's yield (even when admission rejects the mutant: a rejected discovery is still information about the parent's neighborhood); .inserted registers the entry and updates global feature frequencies; .willDraw flushes weights (rarity terms cached, abundance fresh per draw); .removed entries stop receiving weights but keep stats for in-flight lineage. Scoring math unchanged (entropicWeightCombining & co., pinned by the existing characterization vectors). Sugar: policies: { [.entropic()] }. This is the composition the bus architecture couldn't express: entropic selection over a feature-ownership-culled pool. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…ckets) A coverage strategy can now publish the vocabulary the mutation pool accounts feature ownership in, instead of the ledger always falling back to bare edge indices: - CoverageEngine gains an optional features closure, collected inside the same gated window as an accepting decide. - .pathTrie publishes sliding k-grams (default k=2, configurable via .pathTrie(gramLength:)) of the ordered first-hit path — PathTrie now records the path and judges-and-collects in one critical section. Gram hashes are deterministic position-dependent FNV-1a (PathGrams). - .hitCountBuckets publishes (edge << 8 | bucketBit) pairs. - Features widen to UInt64 end-to-end (ledger, entropic policy + math); PoolIterationOutcome.resolvedFeatures is the single vocabulary every pool component reads, falling back to widened edge indices. Rationale: pathTrie accepts on path novelty but the pool culled on edge sets, capping retention at one entry per edge (142 on fsub). K-grams match the retained diversity to the acceptance criterion (measured: 1.1k features at k=2, decaying accept rate vs pathTrie's flat 37%). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

nil publishes no features at all — the pool falls back to covered edge indices, the coarsest (most flood-controlling) setting. Needed both as the user-facing opt-out and to compare vocabularies within one build. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Probe findings on fsub (10s, single engine, culled admission): the k-gram vocabulary grew the pool 44 -> 500 entries, admission 6% -> 48% of accepts, burst completion 97% -> 60%, and - the dominant harm - executed terms drifted 2.2x bigger (wire 174 -> 391), halving iteration throughput (39k -> 15k). Root cause: vocabulary size IS the population ceiling (every resident owns >= 1 feature, one owner per feature), so refining the vocabulary to distinguish inputs better inseparably raised the cap. capacity: Int? on .weightedPool() decouples the two: admission still decides WHO is distinctive in the strategy's vocabulary; the bound decides HOW MANY stay. Overflow evicts the lowest-weight resident (ties: oldest; never the newcomer), and REDUCE bankruptcies run first so they can spare an innocent. Evicted owners stay GHOSTS: their feature claims persist. The alternative (releasing claims on eviction) was probed and rejected - re-opened features made every accept a re-claimant (admission 48% -> 91%, 1.3k evictions/10s, burst completion 34%, throughput down again): a revolving-door FIFO. A represented feature stays represented; only genuinely new features or strictly smaller witnesses win residence. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The pool's only size metric was the covered-edge count, which saturates once coverage does (fsub: median ~100/142 for every entry) — REDUCE ties never steal, eviction can't see bloat, and mutant-of-mutant term drift is invisible to every pool mechanism (probed: executed term wire 174 -> 391, throughput 39k -> 15k iters/10s under a fine vocabulary). - Mutator gains an optional `size` closure (the workload knows its value's real size); compose/combined propagate any component's measure. - The engine sums measured sizes across the input pack, only on accepted runs, into PoolIterationOutcome.inputSize. - FeatureOwnershipLedger judges REDUCE on the real size when present (covered-edge count stays the fallback), so smaller witnesses steal features even when coverage counts tie. - The capacity victim is now lowest weight, then LARGEST measured input, then newest — eviction targets the drift monsters directly. Unmeasured pools keep the elder-anchoring evict-newest rule; the edge-count proxy deliberately never feeds the eviction order (more covered edges mark a better entry, not a worse one). Also fixed in the toolchain fork (3e4ce3824e6): storing an Optional-of-closure field in a generic struct made SILCombine's witness devirtualization crash at pack-element call sites (layoutIsTypeDependent judged enum payloads by unsubstituted interface type). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

… opt-in) 10-trial replication killed every 3-trial k-gram win: - fsub pathTrie kgram+cap64: 90.0%, 2 never-solved vs edges 91.7%, 0 (the original '93.5%/0 beats baseline' cell was favorable noise) - stlc pathTrie kgram unbounded: 84.5%/0 vs edges 84.0%/0 (the original '91.7 vs 86.7' win was noise; stlc probe shows the configs differ only in residence — admission is identical — and stlc merely tolerates the drift instead of profiting) So .pathTrie publishes no vocabulary by default; pathTrie(gramLength:) remains the opt-in (pair it with the scheduler's capacity: bound — vocabulary size is otherwise the pool's population ceiling). hitCountBuckets drops its (edge, bucket) features entirely: that vocabulary equals its acceptance criterion, so every accept owned a fresh feature and culling silently turned off (fsub regressed exactly to its unculled rate, 71.3% vs 86.1% edge-culled). Edge fallback is the working configuration; a coarser bucket-only vocabulary is deliberately not pursued after the universal no-benefit result. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Adds the comparison-coverage (cmplog/value-profile) substrate's first layer: the trace-cmp half of the SanitizerCoverage hooks, mirroring the existing edge recorder path. SanitizerCoverage's __sanitizer_cov_trace_cmp* hooks deliver the operands of each instrumented integer comparison, giving a gradient (e.g. popcount(a^b) as an input nears a boundary `i < c`) that pure edge coverage is blind to. - New SanCovCmpRecorder slot on SanCovMeasurementContext (independent of the edge recorder slot), with the same attach/reset/release lifecycle. - __sanitizer_cov_trace_{,const_}cmp{1,2,4,8} + trace_switch capture the call PC via __builtin_return_address(0) and route operands through sancov_dispatch_cmp to the context's cmp recorder. - CmpRecorderTests: attach round-trip, slot independence, dispatch routing, reset/release lifecycle (7 tests, all green). Validated separately that real instrumented Swift fires these hooks with usable operands (standalone probe, edge,trace-cmp): the i<c gradient is exactly the missing signal for the de Bruijn boundary bugs. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Builds the Swift half of comparison coverage on top of the trace-cmp C hooks. - ComparisonObserver: the cmp analog of EdgeObserver — a strategy's onCompare closure, attached to a measurement context that co-owns it (retain at attach, release at last reference). Rides the independent cmp recorder slot, gated by the same per-thread observer gate as edges. - CoverageEngine gains onCompare; makeEvaluator attaches a comparison observer when set, routing onReset to whichever observer (edge or cmp) is the sole one. - comparisonCoverage strategy: records (comparison-site PC, popcount(a ^ b)) per comparison (libFuzzer value profile), interesting iff a new such feature OR a new edge (union with .newEdge). The Hamming-distance gradient drives mutation toward a boundary `i < c` even when the edge set is unchanged. Publishes no culling vocabulary (avoids the acceptance==vocabulary tautology, per hcb). 11 new tests (ComparisonObserverTests, ComparisonCoverageStrategyTests), full suite green. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…oat) The shift_var_leq experiment (stlc, 20s, 8 trials) shows acceptance-based value profile over-accepts: solve rate 4/8 vs newEdge's 8/8, despite being a strict superset of newEdge's acceptance. The cmp operands belong in input-to-state mutation, not in the acceptance gate. Document the result on the strategy so it isn't promoted as a default; the trace-cmp substrate remains the foundation for the I2S mutator that should actually pay off. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The real payoff lever for trace-cmp (the acceptance-based comparisonCoverage strategy under-performed): use the captured comparison operands to MUTATE inputs toward satisfying the comparison, the auto-dictionary / RedQueen mechanism. - ComparisonDictionary: a bounded, thread-safe ring of recently-seen comparison operands, published via the `current` task-local. The engine attaches a comparison observer that feeds it (when I2S is enabled and the cmp slot is free) and binds it around the mutation loop. - The framework Int mutator samples `current` for both mutate and generate: half the draws jump to a recorded operand (or a ±1 neighbour, for `<`/`<=` boundary bugs), the rest stay ordinary so I2S guides without starving search. - Opt-in via the `inputToStateEnabled` task-local (isolated per campaign — no process-global race) or the PTK_INPUT_TO_STATE env var (launch-time). - PropertyTestingKitTests built with trace-cmp so the integration test exercises the real cmp hooks: I2S reaches a ~10^14 magic-value bug in 3s that random search cannot. Plus unit tests for the dictionary and the Int mutator. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…ue axis) A directional, competitive admission signal on comparison operands, in contrast to comparisonCoverage's value-profile acceptance (which keeps every novel distance and bloats — finding 16). Each comparison site (pc) becomes an ownership feature owned by the input that drove its operands closest together (lowest |arg1 - arg2|, absolute numeric difference, overflow-safe); a strictly closer input steals, ties don't, so the owner only ever gets closer and churn terminates — the value-axis analog of REDUCE. Additive over edge ownership: one entry roster, evict when owning nothing in either dimension, so the measurable delta vs featureOwnership is purely the boundary dimension. - BoundaryDistanceLedger + PoolAdmission.boundaryDistanceOwnership - CoverageStrategy.boundaryDistance: accepts on newEdge OR a strict per-site distance improvement (monotone, not novelty); publishes the run's per-site minimum distance for the ledger to cull on - boundaryDistances channel: CoverageEngine -> CoverageAcceptance -> PoolIterationOutcome -> admission (judge generalized to the outcome; internal-only signature change, public PoolAdmission API unchanged) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…t fix Productivity-weighted, adaptive-depth pool policy (two per-seed scores: draw weight + mutation-depth cascade). Defaults tuned to alpha=0.02, ceiling=45 (swept: doubles solve rate on compound-structure bugs vs the original 0.05/90, which over-escalated depth into the 0%-productive tail). Eval: best config overall (1700/1752 with the workload recursive mutators). - AdaptiveDepthMath/Policy + .setMutationDepth/.inserted(parent,claimed) plumbing across PoolPlugin/WeightedPoolCore/Feature+BoundaryDistance ledgers/Entropic. - chainMutate(depth:) in FuzzStateMachine; .mutate reads per-entry depth. - SchedulerProbe: per-iteration (source, depth, accepted) hook (zero-cost when nil) for productivity/depth/gen-vs-mutation diagnostics. - FuzzStatsAccountingTests: fix flaky seeds/mutations assertions under full-suite cooperative-pool starvation (min(seedCount,total) + guard past the seed phase). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…-covered diagnostic Two SanCov changes from resolving "do failing runs reach full SUT coverage?": 1. Crash fix — sancov_dispatch_cmp had no re-entry guard. A cmp recorder compiled into a trace-cmp module (the test target is) fires comparisons in its OWN body; each re-entered sancov_dispatch_cmp -> recorder -> ... without bound, overflowing the stack (SIGBUS, ~500 frames, confirmed via crash report: captureRecorder recursing 254x). Pre-existing (reproduces on clean HEAD); CmpRecorderTests crashed deterministically and took the full parallel suite down with it. Add tls_in_cmp_recorder (the cmp twin of tls_in_edge_observer), set across the recorder call and the cmp reset hook so a recorder/hook can never re-dispatch into itself. CmpRecorderTests.dispatch test now snapshots+detaches before asserting (the trace-cmp-instrumented #expect comparisons would otherwise re-fire the recorder). Full suite green 3x (475 tests, no signal). 2. Diagnostic — process-global "ever-covered" edge bitmap (g_ever_covered, set in sancov_dispatch_edge post-filter, never cleared by the engine's per-iteration reset; default-NULL/disabled = one predicted-not-taken load). Answers the true executed-edge union of a whole run, which the per-iteration context and admitted-only corpus cannot. Swift API on SanCovCounters + GlobalEverCoveredTests. Result: stlc hard cells reach 100% SUT-logic coverage (54/54) on no-counterexample runs => edge-coverage guidance is saturated; the lever for shift_var_leq / subst_abs_no_shift is value-aware, not coverage. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…Ownership) Adds a third pool-ownership dimension on top of boundary distance: the k-wise three-valued SIGN combinations over near-boundary comparison sites. Distance is a per-site gradient that drives the search TOWARD a comparison's flip point; sign captures which SIDE each near-boundary site landed on — the {<,==,>} position that edge coverage collapses (== shares the not-taken branch of `a<b` with >). Witnesses for off-by-one / conjunction bugs need a JOINT state (site A on its boundary AND site B on a particular side), so the vocabulary is the pairwise sign combinations (bounded vs the intractable 3^n full product), discovery-owned so the pool holds partial witnesses and crosses them toward the conjunction. Motivated by Findings 35/37: on shift_var_leq the boundary A0 (i==c) is abundantly reachable but we never hold (A0,B+) jointly, and coverage is blind to it. - .boundaryState(window:maxSites:) strategy: keeps boundaryDistance's gradient + edge union, adds sign-combination acceptance + publishing. - boundaryStateOwnership admission: BoundaryDistanceLedger gains a discovery- owned sign dimension (never stolen; a qualitative state has no "closer"). - BoundarySignEncoding: three-valued sign + deterministic (non-Hasher) 1-wise and order-independent 2-wise feature hashes; near-boundary windowing + cap. - Plumbed boundarySigns through CoverageEngine/CoverageAcceptance/ PoolIterationOutcome/FuzzStateMachine. boundaryDistanceOwnership unchanged (passes no signs → dimension inert), so the two arms A/B cleanly. Full suite 493 tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

A comparison site hit many times in one run (loop body / recursive descent) previously contributed only the sign at its closest approach, dropping the other sides it visited and breaking ties order-dependently. Track a near-boundary sign MASK per site instead: OR 1<<sign for every hit within `window` (near-gated in onCompare, so a far loop iteration never joins the mask). boundarySignFeatures emits one singleton per side in the mask and the cross-product of sides per pair, so a loop straddle contributes every near side it touched. min-distance gradient + window gate unchanged. Perf-neutral, so it hardens the final shape ahead of the onCompare throughput rework. 497 PTK tests green (+4: loop-straddle, far-side-excluded, mask contract). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Env-gated process-global accumulator that records, per run, the near-boundary participant count and each site's side-multiplicity, then reports the current pairwise (k≤2) vocabulary size against the full subset product (all k) and full width product (k=n). Default-off; one bool check on the feature-emission path. Measured on stlc/boundaryState: the full subset product blows up ~1.7e9× (10^11–10^14 features/run, intractable); pairwise is the right cut. See notebook Finding 40. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Profiling the cmp dispatch (notebook Finding 41) showed the per-comparison tax was the Swift.Dictionary write into currentRun — SipHash + copy-on-write ARC (~50% of the cmp channel in release) — NOT the lock (~6%), correcting the earlier guess. Replace the per-comparison SyncBox<[UInt64: SiteApproach]> with BoundarySiteAccumulator: a concrete open-addressing PC -> (minDistance, signMask) map over raw UnsafeMutablePointer buffers (no generics, no Hasher, no bounds checks/exclusivity, no element ARC), updated via a non-generic record(), reduced once at decide via snapshot(). The accumulator MUST stay synchronised: coverage contexts are keyed by Swift task and inherited by child tasks, so a property spawning concurrent work routes cmp hooks from several threads into one context (the same reason the edge map uses an atomic CAS and pathTrie locks its trie). Kept a lock but swapped NSLock -> os_unfair_lock (NSLock's objc_msgSend was ~25% of the now- small channel). Release cmp-dispatch self-time: 1473ms -> 543ms = 2.7x (-63%). Behavior unchanged: 502 PTK tests green (+5 accumulator tests). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

No-GUI CPU profiling for the comparison hot path (Instruments deep-copy is unavailable headless): - ProfiledBenchmark: trace-cmp instrumented + a comparison-dense closure driven through real fuzz(.boundaryState); PROFILE_STRATEGY/CMP_PER_INPUT/FUZZ_MS knobs. - scripts/aggregate-time-profile.py: parse `xctrace export` time-profile XML to self/total CPU per symbol; --under isolates a subtree's internal breakdown. Resolves Instruments' <weight>/<frame> ref= dedup while streaming. - scripts/record-cmp-profile.sh: build -> xctrace --attach record -> export -> aggregate, end to end. Needs full Xcode (DEVELOPER_DIR=Xcode-beta). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ddr 24.5%→14.6%) On macOS every distinct _Thread_local accessed in a dylib lowers to a tlv_get_addr function call. The per-comparison hot path (sancov_dispatch_cmp → get_current_coverage_map) touched ~6 distinct thread-locals, paying ~6 tlv_get_addr per instrumented comparison — profiled at ~24% of the cmp-dispatch subtree (release). Coalesce the 10 scattered _Thread_local globals into one _Thread_local SanCovTLS struct + a sancov_tls() accessor. Each hot entry point fetches the block address ONCE and threads `ts` down through the routing helpers (get_current_coverage_map / set_tls_measurement_context / get_current_task_for_measurement / ensure_tls_coverage_map all take `ts`), so callees never re-fetch. Pure storage refactor: identical routing logic, same atomics, same locks, per-thread by construction (no new sharing). Measured (release, boundarystate, CMP_PER_INPUT=256): tlv_get_addr self-time 24.47% → 14.64% of the cmp subtree (−41%). The residual is the one unavoidable TLS fetch per comparison. Also fix scripts/aggregate-time-profile.py: it resolved <frame> and <weight> ref= dedup but not <backtrace ref=>, so back-referenced sample rows (95%+ of a hot loop's samples) got empty stacks and attributed to nothing — manufacturing a phantom "96% unsymbolicated". Now resolves all three ref levels; the same trace attributes 99.9% to the runEngines branch. Tests: SanCovTests 39 + ScheduleControlTests 32 + PropertyTestingKitTests 502 all green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ead 61.5%→40.5% of process) The per-comparison cmp-dispatch path was dominated by two costs after the TLS coalescing: the os_unfair_lock taken per record() (~26% of the cmp subtree, an out-of-line libsystem call) and a retain/release pair per comparison in the recorder bridge (~11%, RefCountBitsT atomics). LOCK → LOCK-FREE (BoundarySiteAccumulator): the lock was only required because grow()'s realloc could race concurrent readers (task-inherited child tasks route cmp hooks from several threads into one accumulator). Make the table fixed-capacity (default 8192, far above any real workload's distinct cmp-site count; drops + sets didOverflow if it ever fills) so there is no realloc, then update each slot with per-slot atomics: claim via key CAS, distance via a load-then-weak-CAS min that early-outs with no RMW when the distance doesn't improve (the steady state), sign via atomic OR. Occupied-slot indices are tracked in a side list so snapshot/reset stay O(occupied). A straggler record racing reset/snapshot is memory-safe (fixed buffer) and at worst loses its own unwanted late write. ARC: the recorder bridge did Unmanaged.takeUnretainedValue().onCompare(...) per comparison, which the compiler brackets with a retain/release pair. Switch to _withUnsafeGuaranteedRef — sound because the context co-owns the observer and is alive for the whole call. Helps every cmp strategy (boundary, I2S, comparisonCoverage). Measured (release, boundarystate, CMP_PER_INPUT=256): cmp-dispatch as a fraction of whole-process CPU dropped 61.5% → 40.5%. os_unfair_lock and the ARC refcount atomics are gone from the cmp subtree; the residual is the correctness-required task-keyed routing plus genuine accumulator work. TDD: 503 PropertyTestingKitTests green, including a new concurrent-records lock-free-safety test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… analysis Adds a diagnostic that answers "is comparison volume concentrated in a few filterable sites, or is it the relevant SUT comparisons themselves?". When PTK_CMP_CENSUS=<path> is set (checked once in a constructor), sancov_dispatch_cmp records per comparison-site PC: fire count and min |arg1-arg2|, into a fixed lock-free open-addressing table; dladdr-symbolized and dumped atexit. Zero production cost when unset (one predicted-not-taken atomic load of g_cmp_census, same pattern as g_ever_covered). Used to settle the "measure fewer comparisons" lever (notebook Finding 41g): on the stlc SUT, 57% of comparison volume is Swift array bounds checks (_checkIndex/count on the de Bruijn [Typ] context), 31% genuine SUT-logic, and only ~12% is safely-droppable (generator/enum-equality/value-witness). The distance-approach filter is dead (most sites reach distance 0). So the cmp throughput cost is mostly intrinsic to measuring the value signal. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… strategies Drops synthesized/stdlib comparison sites from the trace-cmp hot path so the per-comparison dispatch tax concentrates on SUT-logic comparisons that actually witness bugs. Classifier sancov_cmp_should_drop() flags stdlib methods (Swift module / standard-substitution types — Array bounds checks, count getters, buffer copies), synthesized Equatable (__derived_enum_equals), value witnesses, and everything sancov_is_compiler_generated already flags (outlined/metadata). Keeps user-module SUT functions. Verdict cached per comparison-site PC (dladdr + classify on first fire, relaxed-load lookup thereafter); opt-in via env, default off = one predicted-not-taken load. Counter reports distinct dropped sites via on-demand scan, never per-comparison (a contended RMW there halved throughput). Measured on stlc: 89% comparison volume dropped (360.9M->39.5M), only SUT-logic symbols survive; boundarystate throughput +1.57x (412k->646k tests/6s). On the shift_var_leq cell the bug-witnessing `i < c` comparison is kept, so guidance is preserved; narrows but does not close the gap to newedge (which pays no cmp tax). 503 PTK tests green; new SanCovCmpDropTests covers the classifier (red->green). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Synthesized/stdlib comparison sites carry no SUT signal; taxing them only slows the trace-cmp value-aware strategies. Flip PTK_CMP_DROP_SYNTHESIZED to default-on (measured +1.3-1.6× boundarystate throughput on stlc); opt out with =0 when a bug can manifest as a value at a stdlib bounds-check comparison. Hot-path branch hint flipped to expect-enabled. 503 PTK tests pass (the lone GlobalEverCovered failure is the pre-existing edge-bitmap cross-test flake — passes in isolation, unrelated to the cmp path). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…on drops) The profile (Finding 41i) showed macOS thread-local access (tlv_get_addr + pthread_getspecific + sancov_tls) is ~34% of the process and the single biggest floor. The drop check needs only the pc argument and the global table (a plain atomic load) — not the thread-local block — so moving it ahead of sancov_tls() lets every DROPPED comparison return without paying a tlv_get_addr. On stlc that skips ~320M TLS fetches/run (89% of comparisons drop). Safe ahead of the re-entry guard: the drop check fires no instrumented comparisons (SanCovHooks/libc aren't trace-cmp instrumented), a dropped site never reaches the recorder, and kept sites still hit the guard. stlc boundarystate ~638k vs ~446k tests/6s opt-out (1.43×). 503 PTK tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@inline

…oughput) The heaviest-trace profile (Finding 41k) showed boundaryState's once-per-iteration decide closure was the #1 cost cluster (~14%), allocation-bound: it rebuilt a perSite dictionary from the sites array just to feed boundarySignFeatures, which itself re-allocated the near-site selection, the features array, and a fresh sides() [UInt8] per site in nested loops — every iteration. Refactor: boundarySignFeatures gains an allocation-light core that reads sign mask + distance straight from the sites array into reused inout buffers (features + a near-site scratch kept in DistanceState), and walks the ≤3 mask bits inline via @inline(__always) forEachSide (no per-site array). The dict-keyed signature stays as a thin wrapper for tests/non-hot callers. The engine drops the perSite dict entirely. Result (stlc boundarystate, release): throughput ~638k → ~1.03M tests/6s = 1.6×; malloc/free fell out of the whole-process top 10 (was ~8.3% combined). 505 PTK tests pass (2 new: array-core parity with the dict reference + buffer reuse). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… bitmap The edge-union novelty oracle ("has any run ever hit this edge?") was a Set<UInt32> in every strategy (.newEdge, .boundaryDistance/State, .comparison- Coverage), inserted per covered edge per iteration. SanCov edge indices are dense and bounded by the guard count, so a packed bit array (EdgeUnionBitmap) gives the same Set.insert(_:).inserted answer in O(1) with no hashing and no per-insert allocation after warm-up. Profile-confirmed: for .newEdge (pure edge union) Set.insert + Hasher are now GONE from the whole-process top 10 — its hot path is purely SanCov/routing. Honest scope note: this barely moves .boundaryState, whose dominant Set is NOT seenEdges but seenSigns (the Set<UInt64> of sign features — high volume from the pairwise cross-product). seenEdges was the minor share. That is a separate lever (seenSigns can't be bitmapped — sparse 64-bit hashes). The bitmap is still strictly cheaper than the Set everywhere and never worse. 509 PTK tests pass (4 new EdgeUnionBitmap tests; the lone EntropicPolicy failure is the known RNG-tie flake — fails ~1/3 in isolation, unrelated). Workloads need `rm -rf .build` to pick up the new file (SwiftPM stale-plan gotcha). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…6pt of CPU) The heaviest-trace profile (Finding 41n) showed the boundaryState decide cost was seenSigns — a Set<UInt64> of sign-combination features inserted per iteration — costing Set.insert 4.16% + Hasher 2.56% ≈ 6.7% of the process. But those feature keys are ALREADY splitmix64-mixed hashes (encodeBoundarySign1/2), so Set<UInt64> re-hashed uniform bits with SipHash for nothing. FeatureHashSet: open-addressing UInt64 membership keyed on the value's own (pre-mixed) low bits — no Swift Hasher — with the Set.insert(_:).inserted contract and a separately-tracked literal 0 (the empty-slot sentinel). Same trick BoundarySiteAccumulator uses for PC keys. Swapped into seenSigns (.boundaryState/.boundaryDistance) and seenFeatures (.comparisonCoverage). Re-profile (stlc boundarystate): Set.insert + Hasher GONE; FeatureHashSet.insert is 1.11% (down from ~6.7%). Throughput ~1.0M → ~1.18M tests/6s. TLS routing (tlv_get_addr) is again the clear top cost; the decide-side data structures are now cheap. 513 PTK tests pass (4 new; lone EntropicPolicy fail is the known RNG-tie flake). Workloads need `rm -rf .build` for the new file. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The generator/mutator runs instrumented SUT code (a type-directed generator calls getTyp; mutators validate mutants the same way), but that coverage is NOT the property under test — it is reset away before the test runs. Dispatching and recording it (routing + first-hit + BoundarySiteAccumulator.record) was ~25% of the process on stlc. Add a per-thread `suppressed` flag in SanCovTLS; the dispatch_edge and dispatch_cmp hooks early-return when it is set. The fuzz loop sets it around the straight-line input-production block (no await, no thread hop) and clears it before resetCoverage so the test is always measured. Per-thread, so a mutating engine can't suppress a concurrently-testing one. Profile (stlc shift_var_leq, release): generateMutation subtree drops 33% -> 22% inclusive; get_current_coverage_map / record_first_hit / BoundarySiteAccumulator.record vanish from it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

SyncBox (an NSLock-backed test utility) had leaked onto the per-dispatch observer path. Env-gated PTK_LOCK_METRICS instrumentation measured the damage on stlc/SinglePreserve (synchronous SUT, so contention was 0 — the cost is the uncontended acquisition itself): hitCountBuckets.state 14,272,900 acquisitions / 20,000 tests = ~714/test boundaryDistance.state 1,386,368 / 1,385,000 = ~1/test Rewrites (all mirror the lock-free BoundarySiteAccumulator: fixed-capacity flat per-slot atomics, claimed-index O(occupied) drain): - HitCountAccumulator replaces the per-edge SyncBox in HitCountBucketsStrategy.onEdge. Throughput 20k -> ~565k tests/6s (~28x). - AtomicFeatureSet replaces the per-comparison SyncBox in ComparisonCoverageStrategy.onCompare. - ComparisonDictionary's OSAllocatedUnfairLock ring -> a flat atomic ring with a monotonic atomic cursor (I2S record path, per comparison). - UncheckedBox (SyncBox's API minus the lock) replaces the per-iteration, decide-only SyncBoxes in newEdge / signatureMatch / boundaryDistance / pathTrie. The engine-lifetime halves moved to single-thread holders (decide is serialized per engine; observers never touch them). Kept as SyncBox: boundarySign.diag (a file-scope global shared across engines, diagnostic-only). LockMetrics kept (env-gated, zero-cost off) for future audits. PathTrie.advance's NSLock is deferred (ordered mutable trie, not a flat accumulator) — tracked in doordash-oss#46. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Edge-only strategies (newEdge / hitCountBuckets / pathTrie / signatureMatch) attach no comparison recorder, yet the trace-cmp hooks still fired sancov_dispatch_cmp ~33-35M times / 6s — each paying sancov_tls() + get_current_coverage_map() — and ZERO reached a consumer (measured via the new env-gated PTK_DISPATCH_COUNT counters). That was ~43% of all dispatch-TLS, pure waste (Finding 43). Add a process-global g_cmp_recorder_count, adjusted on the 0<->nonzero transition in sancov_context_set_cmp_recorder (exchange the old bits) and in end_measurement (sever). sancov_dispatch_cmp now early-returns after the drop filter but BEFORE the TLS fetch when no recorder is attached anywhere (and the census is off) — both are plain global loads, no TLS. Race-free: the count moves only at measurement setup/teardown. A mixed run with any cmp-consuming engine keeps count > 0, so a real consumer is never suppressed. Result: newEdge/hitCountBuckets cmp_dispatches 33-35M -> 0; newEdge clean throughput ~+8.5%; boundaryState unchanged (consumes every comparison). TDD: SanCovCmpRecorderGateTests asserts the count tracks attach / re-attach / clear / end_measurement. 48 SanCov + 75 strategy/routing tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Reorder sancov_dispatch_cmp so the process-global cmp-recorder gate (g_cmp_recorder_count == 0 && census == NULL) runs FIRST, before the synthesized/stdlib drop filter. Edge-only strategies (newEdge / hitCountBuckets / pathTrie / signatureMatch) attach no cmp recorder, so they now skip both cmp_drop_should_skip (~6%, Finding 43) and the TLS fetch on every kept comparison — the gate is two plain global atomic loads, the cheapest possible early-out. Behavior is unchanged: when count==0 the hook already did nothing observable; this only makes that path cheaper. Confirmed by profiling newedge — cmp_drop_should_skip drops entirely out of the sancov_dispatch_cmp subtree. Guarded by existing SanCovCmpRecorderGateTests + SanCovCmpDropTests (7 green). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Newer Swift toolchains conform integer types to the stdlib's AtomicRepresentable (Synchronization) in addition to swift-atomics' AtomicValue, so the bare `UInt64.AtomicRepresentation` becomes ambiguous and the build breaks after an Xcode/SDK bump — with no change to our code or the swift-atomics pin. `AtomicRep<T>` constrains the lookup to swift-atomics' AtomicValue, resolving it unambiguously. Applied across the lock-free accumulators that allocate flat atomic-storage buffers. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… walk Task doordash-oss#49: an instrumented SUT value-`destroy` fires a coverage edge while a FuzzResult's Corpus is torn down inside a cooperative worker's ~AsyncTask. `g_coverage_inheritance_key` is process-global and never cleared, so routing walks the dying task's task-local chain; its head (task+136) is freed and poisoned (in `sancov_is_valid_pointer`'s coarse range but unmapped), so the raw memcpy faults -> SIGSEGV. (This is the real story behind the old "cross-session g_target_context UAF" guess.) Fix, pinned with a real .ips + live lldb (which refuted a first count-gate hypothesis: g_active_ctx_count was 2 at the fault, not 0): 1. Runtime-authoritative gate: the runtime's own swift_task_localValueGet is task-state-aware and returns 0 without faulting on a dying task. Track whether it ran and only fall back to the manual chain walk when it did NOT — removing the fragile walk from the common path (also a perf win). 2. Fault-safe reads: safe_read() wraps vm_read_overwrite(mach_task_self(), ..) so the residual fallback can't fault either — a poisoned chain just ends the walk (= no inherited context). Verified by a TDD regression (mmap+munmap a page = valid-range-but-unmapped poison at task+136, expect the seam returns 0, no fault; deterministic SIGSEGV pre-fix) and 3000 lldb relaunch-until-crash iterations clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…y flake EntropicPolicyTests asserted on weighted-draw outcomes that depend on FastRNG, which is unseedable — so ties broke nondeterministically and the suite flaked. Rather than seed FastRNG (it stays a zero-dispatch thread-local shim on the hot path), follow swift-dependencies' own withRandomNumberGenerator pattern: a `\.fastRandomNumberGenerator` dependency wrapping FastRNG via Point-Free's WithRandomNumberGenerator. WeightedPoolCore resolves it once in init and draws scalars inside its @sendable closure (capturing only locals, never self). Tests override it with a seeded DeterministicRNG (SplitMix64) for reproducible draws. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@inline

The A/B (Findings 45-48) showed the boundary-sign vocabulary bought zero bug-finding over the raw distance gradient, so remove it entirely: delete BoundarySignEncoding + the boundaryState strategy and its tests, and drop the boundarySigns field threaded through the acceptance/pool path. The accumulator now stores ONLY the per-site minimum |arg1 - arg2| — one atomic word per bucket, no packing, no sign, no near-window. Hot-path perf on the per-comparison cmp channel: - absoluteDifference is branchless: wrap-once + conditional negate lowers to subs+cneg (2 instr) vs the two-subtraction ternary's sub+subs+csel (3). - record() is @inline(__always). The module builds non-WMO (one .o per file), so without it record stayed an out-of-line cross-file tail-call from onCompare. It has a single hot caller, so folding it in (with its already- inlined hash/probe/updateSlot helpers) costs no code size and makes the whole per-comparison path a single call-free leaf — verified by disasm. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- project.pbxproj: add AtomicRep.swift / DeterministicRNG.swift, drop the deleted BoundarySignEncoding + boundary-sign/state test files, so the project keeps building from Xcode. - Package.resolved: remove the now-unused package-jemalloc pin. - open-instruments.sh: launch the benchmark ourselves with the patched runtime on DYLD_LIBRARY_PATH and attach xctrace (the Instruments GUI "Choose Target" path can't set it, so the binary aborts on _swift_coroFrameAlloc); add an optional time-limit arg. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…mentation Move EmitCmpTrace + TagCompilerGenerated from /tmp into LLVMPasses/, built by scripts/build-llvm-plugins.sh (auto-run at the top of build-local-toolchain.sh) against the patched toolchain's LLVM. EmitCmpTrace emits the trace_cmp callbacks we want (dropping trap-guard comparisons); TagCompilerGenerated tags compiler- generated functions NoSanitizeCoverage. Dylibs build to .build/llvm-plugins (gitignored) since they link this toolchain's LLVM. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Package.swift computes the plugin dylib path from #filePath and a loadPass() helper, then loads TagCompilerGenerated (and EmitCmpTrace for the cmp targets, replacing stock -sanitize-coverage=…,trace-cmp) on every instrumented target. TagCompilerGenerated loads first so EmitCmpTrace skips tagged functions. Builds green with the runtime filters still present (they become near-no-ops). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The TagCompilerGenerated/EmitCmpTrace plugins filter compiler-generated edges and trap-guard comparisons at compile time, so the runtime filters are dead code. Remove from SanCovHooks (2413->2029 lines): the lazy edge filter (sancov_apply_edge_filter, sancov_is_compiler_generated, g_edge_state, the first-fire classify + on-disk cache + atexit, and the per-edge consult in __sanitizer_cov_trace_pc_guard) and the cmp drop filter (sancov_cmp_should_drop, cmp_drop_should_skip, g_cmp_drop_table, cmp_drop_init, PTK_CMP_DROP_SYNTHESIZED). Drop the header decls, SanCovCounters.applyEdgeFilter/filteredEdgeCount, and the FuzzEngine.run call. The hot path loses a per-edge g_edge_state load and a per-cmp drop-table probe; the compiler-generated classifier now lives only in the plugin. Delete the now-obsolete SanCovEdgeFilterTests + SanCovCmpDropTests, trim PCResolutionTest's classifier tests, and remove the applyEdgeFilter() calls from the determinism tests. Suite green (SanCovTests 37, ScheduleControlTests 32, PropertyTestingKitTests 503, GenericTimerPollerTests 26); determinism 10/10. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

CoverageStrategy.compose([...]) / .combined(with:) unions N strategies into one engine: every substrategy's onEdge/onCompare/onReset runs, every decision runs (no short-circuit, so each updates its own novelty oracle) and the results are OR-ed. Pool vocabularies are merged — features namespaced per substrategy via a SplitMix64 finalizer so two strategies' raw values can't collide in the shared ownership space, boundaryDistances merged per site by the closer value. Add .boundaryDistanceOnly: the comparison channel without the edge-coverage union, so composing it with an edge strategy (e.g. .pathTrie.combined(with: .boundaryDistanceOnly)) doesn't double-count edges. Plain .boundaryDistance is unchanged (== .newEdge unioned with this). The admission side composes for free: a composed engine publishes both the namespaced features and the merged distances, and PoolAdmission .boundaryDistanceOwnership already culls over resolvedFeatures + boundaryDistances (featureOwnership covers the edge-only case). No new ledger needed. 4 composition tests; full suite 507 green. (Requires the SUT built with the cmp channel for the boundary axis to fire.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…Discovery The library default for `MutationScheduler.weightedPool(admission:)` was `everyDiscovery` — every strategy-accepted input joins the pool and nothing ever leaves. On stlc that floods the pool with ~2400 entries whose median wire size is 421 chars (max 1469); mutating one of those giants rarely lands on the relevant node. Flip the default to `featureOwnership` (libFuzzer-style REDUCE): each feature is owned by the smallest witness, larger owners are evicted. Measured on the clean stlc baseline this collapses the live pool to ~20 entries of median size 49, and the mean *executed* term shrinks 5x (323 -> 65 chars). On the hard de Bruijn mutant shift_var_leq the flip finds the bug 20/20 at median 4.0s vs everyDiscovery's 17/20 at 6.7s — better detection AND speed, the direct payoff of smaller, better-targeted mutation parents. Callers that want the old keep-everything behavior still pass `admission: .everyDiscovery` explicitly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…urce) Faithful re-port of PR doordash-oss#54 (trace-cmp-substrate, tip 06d5799) onto post-doordash-oss#45 main. Source compiles; tests follow in the next commit. - SanCovHooks C: trace-cmp hooks (sancov_dispatch_cmp, cmp recorder slots + lifecycle, in_cmp_recorder re-entry guard, TLS coalescing, cmp drop filter, lock-free cmp accumulator, dispatch suppression) grafted onto main's file via 3-way merge, preserving main's inheritance-walk fix. - Seam: CoverageEngine gains onCompare/boundaryDistances; CoverageStrategy attaches a ComparisonObserver and captures boundaryDistances; CoverageProbe threads boundaryDistances through CoverageVerdict and now hosts the I2S dictionary (it owns the measurement context). - Strategies: comparisonCoverage, boundaryDistance(+Only), compose/combined; lock-free AtomicFeatureSet/HitCountAccumulator/BoundarySiteAccumulator/ EdgeUnionBitmap/FeatureHashSet; AtomicRep + UncheckedBox + LockMetrics. - Scheduler: PoolIterationOutcome.boundaryDistances; PoolEvent.inserted gains parent/claimed; PoolAction.setMutationDepth; WeightedPoolCore chains mutation depth in next() and judges on the whole outcome; BoundaryDistanceLedger + boundaryDistanceOwnership admission; AdaptiveDepthPolicy/Math; SchedulerProbe. - I2S: ComparisonDictionary (moved to FuzzCore) + Int mutator hook; opt-in via PTK_INPUT_TO_STATE / task-local. - Build: in-repo LLVM pass plugins (EmitCmpTrace, TagCompilerGenerated) replace the runtime edge/cmp filters; Package.swift loads them via -load-pass-plugin. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_014mrEZMehSXEHXv6vvGvzsP

…wo grafts Brings in the branch's trace-cmp tests and reconciles the existing suites with the post-doordash-oss#43/doordash-oss#44/doordash-oss#45 architecture. All three targets green: PropertyTestingKitTests 532, SanCovTests 37, ScheduleControlTests 32 (601 total, 0 unexpected failures). Test ports (assertions preserved, API surface only): - Strategy tests use the non-generic CoverageEvaluator (evaluate(context, client)) and read CoverageAcceptance.boundaryDistances. - Scheduler tests drive WeightedPoolCore via the generic init / WeightedPoolHarness (generationRatio replaces burstLength/focusOnInsert), decide() not next()-directive, and the 5-tuple PoolEvent.inserted(parent:claimed:). - EntropicPolicyTests/.inserted bumped to the 5-tuple; InheritanceTest/FuzzStats kept at main's versions; SanCovEdgeFilterTests deleted (runtime filter is compile-time now). - New suites: Cmp/ComparisonObserver/GlobalEverCovered, AdaptiveDepth*, Atomic/HitCount/ EdgeUnion/FeatureHashSet/BoundarySite accumulators, ComparisonDictionary, Fuzz/IntInputToState, LockMetrics, SanCovCmpRecorderGate/Suppression; DeterministicRNG support. Two source grafts completed (missed in the source commit): - LockMetrics moved to FuzzCore (SyncBox, its consumer, lives there) + SyncBox's metrics/acquire()/forceMetrics integration grafted onto main's FuzzCore SyncBox. - chainMutate restored as a public SchedulerSupport helper; WeightedPoolCore.next() calls it (preserves AdaptiveDepthChainTests' direct assertion). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_014mrEZMehSXEHXv6vvGvzsP

xcodegen generate to pick up the new trace-cmp sources/tests and the LockMetrics/ComparisonDictionary/AtomicRep moves into FuzzCore. project.yml keeps -sanitize-coverage=edge,pc-table without -load-pass-plugin: as on the source branch, the LLVM pass plugins are wired for the CLI build only (Package.swift + build-llvm-plugins.sh), so the Xcode build is edge-coverage only and the cmp channel / I2S are CLI-only — the branch's existing design. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_014mrEZMehSXEHXv6vvGvzsP

twof · 2026-06-19T02:05:06Z

Structurally re-ported onto post-#45 `main`

This branch was force-updated from a structural re-port of the original 45-commit stack onto the rearchitected main (scheduler-owns-pool / FuzzCore split / CoverageProbe seam / UInt64 features). The prior tip is preserved at trace-cmp-substrate-prerebase (06d5799).

What's included (full faithful re-port):

trace-cmp substrate — C cmp hooks (sancov_dispatch_cmp, per-context cmp recorders + lifecycle, in_cmp_recorder re-entry guard, TLS coalescing, drop filter, lock-free cmp accumulator), grafted onto main's SanCovHooks.c preserving its inheritance-walk fix; ComparisonObserver bridge.
seam — CoverageEngine.onCompare/boundaryDistances; CoverageProbe/CoverageVerdict thread boundaryDistances (mirroring features) and host the I2S dictionary.
strategies — comparisonCoverage, boundaryDistance(+Only), compose/combined; lock-free accumulators (Atomic/HitCount/BoundarySite/EdgeUnion/FeatureHashSet).
scheduler — boundaryDistanceOwnership admission, AdaptiveDepthPolicy, setMutationDepth + depth chaining, SchedulerProbe.
I2S mutation (the measured win on direct-dataflow/magic-value bugs) — ComparisonDictionary + Int mutator hook, opt-in via PTK_INPUT_TO_STATE.
build — in-repo LLVM pass plugins (EmitCmpTrace, TagCompilerGenerated) replace the runtime edge/cmp filters.

Eval status (unchanged from the original branch): the cmp acceptance strategies (comparisonCoverage, boundaryDistance, adaptive-depth) remain documented-negative baselines — edge coverage still wins; they are not defaults. The substrate + I2S are the keepers.

Tests: PropertyTestingKitTests 532, SanCovTests 37, ScheduleControlTests 32 — all green (per-target; the full-parallel run has the pre-existing #49 sancov flake).

Not benchmarked (per the agreement on the #45 stack).

Xcode note (pre-existing design): project.yml keeps -sanitize-coverage=edge,pc-table without -load-pass-plugin — the LLVM passes are wired for the CLI build only (Package.swift + build-llvm-plugins.sh). So the Xcode build is edge-coverage-only; the cmp channel + I2S are CLI-only, exactly as on the source branch.

🤖 Generated with Claude Code

…clock flake) PoollessSchedulerTests.poollessSchedulerRuns drove a 0.3s wall-clock fuzz and asserted totalInputs > 0. Under the full 120-way parallel suite, probe setup + campaign-scope install can exhaust the 0.3s budget before the first iteration (the loop forces a time check on iteration 1, FuzzStateMachine.swift:199/211), so the run executes zero inputs and the expectation fails. This was the lone non-known issue in the otherwise-green full-parallel run. Fix mirrors the committed FuzzStatsAccountingTests cure: a generous wall-clock ceiling plus a deterministic stopAfter(N) iteration-counter plugin, so "the pool-less scheduler drives the engine" holds regardless of machine load. Verified 8/8 clean full-parallel runs (632 tests, 0 real failures). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_014mrEZMehSXEHXv6vvGvzsP

…uator/Ledger) First step of the feedback-agnostic refactor. Introduces the metric-agnostic ownership primitive and the two evaluators that hold the ownership criterion: - OwnershipLedger: feature(domain,id) -> owner roster, reassignment + eviction, sequential entry IDs. Knows nothing of size/distance — it records claims an evaluator already decided. - EdgeOwnershipEvaluator: REDUCE (smallest input owns an edge; ties don't steal). - BoundaryDistanceEvaluator: lowest |arg1-arg2| per pc owns; ties break toward the smaller input (a refinement over BoundaryDistanceLedger, which never broke ties). These are the decomposed halves of today's FeatureOwnershipLedger and BoundaryDistanceLedger; not yet wired (additive). 10 new tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_014mrEZMehSXEHXv6vvGvzsP

…lel cmp path Phase 1c. The admission no longer owns the ownership *criterion* — it composes the decomposed pieces: - featureOwnership now runs EdgeOwnershipEvaluator + BoundaryDistanceEvaluator and records their claims in the generic OwnershipLedger. The boundary evaluator is inert unless the strategy publishes boundaryDistances, so this one admission subsumes the former boundaryDistanceOwnership: add the cmp signal and the same admission culls over it. - Deleted FeatureOwnershipLedger, BoundaryDistanceLedger, and the boundaryDistanceOwnership admission (all subsumed). - cmp is no longer a parallel ownership mechanism — it's a peer evaluator feeding the same roster as edges. Test suites migrated to compose evaluator+ledger; the unchanged pool-level admission suites (now pinned to .featureOwnership) validate behavior is preserved. 543 PropertyTestingKitTests pass, 0 non-known failures. xcodeproj regenerated for the deleted/added files. Note: boundary ties now break toward the smaller input (a refinement; the old ledger never broke boundary ties). The boundaryDistances signal carrier on the verdict stays until Phase 3 gives cmp its own probe. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_014mrEZMehSXEHXv6vvGvzsP

…tation providers Phase 2 of the feedback-agnostic refactor. Coverage is no longer a privileged top-level fuzz() knob; it is the provider the chosen scheduler vends. - SchedulerFactory gains makeProviders() -> [any InstrumentationProvider] (default []). The engine installs only providers whose key matches the scheduler's requiredProbes, so a pool-less / cmp-only / exotic scheduler pays for nothing it didn't ask for. - MutationScheduler becomes a namespace (caseless enum); weightedPool(...) gains a coverageStrategy: param and returns a concrete WeightedPoolFactory that vends the CoverageProvider. (Dropping the wrapper struct also sidesteps a patched-toolchain parameter-pack parser fault that fired when a forwarding member sat beside the pack-generic makeScheduler.) - fuzz()/regress() and FuzzEngine convenience drop coverageStrategy; runEngines builds providers via scheduler.makeProviders(). Replay forces .alwaysInteresting by passing weightedPool(coverageStrategy: .alwaysInteresting). - Call sites migrated: `coverageStrategy: X` -> `scheduler: MutationScheduler.weightedPool(coverageStrategy: X)`. TestHelpers' helpers now take `scheduler:` to mirror the new API. 543 PropertyTestingKitTests pass, 0 non-known failures; full build (incl. benchmarks) clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_014mrEZMehSXEHXv6vvGvzsP

…ger (Phase 3) Demonstrates the payoff of the Signal/Evaluator/Ledger decomposition: the scheduler carries the comparison signal itself (.boundaryDistance) — vends the cmp-recording provider and culls over the comparison-distance axis through the same OwnershipLedger as edges, with the boundary evaluator a peer of the edge evaluator. The engine names no signal. Scope note (in the test): a *pure* cmp-only run (.boundaryDistanceOnly, no edge axis) additionally needs an instrumented comparison in the SUT (a bare stdlib `Int ==` lowers to the uninstrumented stdlib at -Onone, emitting no trace_cmp in-target) and a corpus retention signature that isn't SparseCoverage. Those are the remaining Phase-3 items; this pins the wired-and-green part. 544 tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_014mrEZMehSXEHXv6vvGvzsP

…o compiler bug) Phase 2 made MutationScheduler a caseless enum to dodge a "pack expansion ... can only appear in a variadic type" error I attributed to a patched-toolchain parser fault. That attribution was WRONG: a clean build of the wrapper-struct design (struct: SchedulerFactory wrapping `any SchedulerFactory`, forwarding makeScheduler + makeProviders, with a static weightedPool returning Self) compiles fine and 544 tests pass. The earlier failures were incremental-build staleness during rapid edit->build cycles, not a compiler bug — five reduced single-file variants all compile clean, and the full clean build of the real wrapper does too. Restores the wrapper struct (the design originally intended): keeps the "vend any scheduler, not only the weighted pool" promise as a value type rather than a namespace. weightedPool returns MutationScheduler again; call sites use it as any SchedulerFactory unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_014mrEZMehSXEHXv6vvGvzsP

…et (B′) The corpus used to make the scheduling decision — it checked each input for novel coverage and saved it. That is no longer its job: the scheduler's ownership ledger owns retention. So stop maintaining the corpus during the run and build it once, after the loop, from what the scheduler vends. - AnyScheduler.observe returns Void (folds signals into the scheduler's own state); new AnyScheduler.snapshot() -> [(repeat each Input)] is read once by the engine at run-end to build the corpus. WeightedPoolCore.snapshot vends its live pool (evicted entries already gone); a pool-less scheduler vends []. - Corpus is no longer coverage-keyed: delete CorpusEntry.sparseCoverage, Corpus.signatures / mergeCoverageAndAdd / addIfInteresting, and SubmitToCorpusAction.sparseCoverage; add() drops its sparse: param. Plugin failures still append during the run via addToCorpus, just without coverage. - Cross-engine mergeCorpusSnapshots dedups by encoded input bytes (CorpusEntry already encodes exactly the input array) instead of coverage signatureHash; made internal so the dedup contract is unit-tested. On-disk format is unchanged — coverage was never serialized. 545 PropertyTestingKitTests green, including two new pins: the engine builds the corpus from snapshot() (not observe), and cross-engine merge dedups by input. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_014mrEZMehSXEHXv6vvGvzsP

…mitToCorpus Make the corpus a value materialized from the scheduler at run-end, with no mid-run writers at all — the engine holds no corpus during a run. - Remove the submitToCorpus action + SubmitToCorpusAction. The shrinking plugin no longer persists the minimized failing input; it still minimizes, biases mutation (selectForMutation), and records the issue. Failure retention for regression is tracked separately (doordash-oss#55). - With no mid-run writers, drop FuzzStateMachine's held Corpus and its ctor param. FuzzStateMachineResult.corpus is now a CorpusSnapshot built at run-end from scheduler.snapshot(). FuzzEngine drops corpusRegistry.getCorpus() and the redundant .snapshot(). - Delete now-dead types: CorpusEntryType, FailureInfo, CorpusClient/ CorpusRegistryProtocol (the registry was only the engine's empty-Corpus factory). CorpusEntry is now just { input, scheduleBytes }. - Keep the mutable Corpus class as a test-only value builder (the engine no longer touches it); regenerate the Xcode project for the deleted files. The scheduler is per-engine and torn down after the run, so the result owns a materialized snapshot — there is no live-reference view into a freed scheduler. 545 PropertyTestingKitTests green, 0 failures (ShrinkingPluginTests now expects 2 plugin actions). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_014mrEZMehSXEHXv6vvGvzsP

There is no mutable corpus type any more. The engine already materialized its result as a CorpusSnapshot at run-end (previous commit); the Corpus class was only a test-time value builder. Remove it and the tests that exercised its stateful behavior. - Delete the Corpus class from Corpus.swift; keep CorpusSnapshot (now also exposes `inputs`). - Remove tests of Corpus statefulness: CorpusTests.swift, and testCorpusIsEmpty / testCorpusInputs / testCorpusComplexTypes in PropertyBasedSelfTests (the cross-engine input-identity merge test stays). - Convert strategy tests that used a Corpus as a counting sink to assert on the evaluator's own signal (boolean / snapshot count); drop ~9 dead `let corpus` leftovers; build CorpusSnapshot directly where a snapshot was needed. - Anchor realisticCoverageGapTest's expected gap line to the function via #line (funcAnchor + 12) so edits elsewhere in the file no longer shift it — removing a line above it had moved the SUT and broken the hardcoded expectedLine. Regenerate the Xcode project for the deleted file. 541 PropertyTestingKitTests green, 0 failures. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_014mrEZMehSXEHXv6vvGvzsP

signal-evaluator-ledger is the successor architecture: signal-agnostic scheduler (AnyScheduler/SchedulerFactory + makeProviders), the Signal/Evaluator/Ledger decomposition (OwnershipLedger + edge/boundary evaluators), and corpus-as-view (no mutable Corpus, no sparseCoverage, no submitToCorpus). It deletes the bolted-on cmp path (BoundaryDistanceLedger) and the coverage-keyed corpus that trace-cmp-substrate still carried. Per decision, SEL supersedes trace-cmp-substrate: this merge takes SEL's tree wholesale and makes trace-cmp-substrate the canonical branch going forward. Both parents are recorded. Superseded (NOT carried into the tree) — trace-cmp-substrate perf/substrate work that optimized the old design and must be re-applied to the new architecture if still wanted: - hot-path perf: lock excision (Finding 42), no-SipHash feature sets, bitmap edge union, per-iteration alloc kills, cmp-recorder gating, cmp drop filter, dispatch suppression during gen/mutation - distance-only boundary accumulator; runtime->compile-time filters - featureOwnership-by-default tuning Tree is identical to signal-evaluator-ledger @ 1c4f840 (541 tests green). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_014mrEZMehSXEHXv6vvGvzsP

twof and others added 30 commits June 12, 2026 09:08

twof and others added 17 commits June 15, 2026 15:48

twof force-pushed the trace-cmp-substrate branch from 06d5799 to a4927a1 Compare June 19, 2026 02:04

twof and others added 11 commits June 18, 2026 19:26

add perf parsing script

cf7cf0d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trace cmp substrate#54

Trace cmp substrate#54
twof wants to merge 58 commits into
doordash-oss:mainfrom
twof:trace-cmp-substrate

twof commented Jun 19, 2026

Uh oh!

twof commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Conversation

twof commented Jun 19, 2026

Uh oh!

twof commented Jun 19, 2026

Structurally re-ported onto post-#45 main

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Structurally re-ported onto post-#45 `main`