Skip to content

Trace cmp substrate#54

Open
twof wants to merge 58 commits into
doordash-oss:mainfrom
twof:trace-cmp-substrate
Open

Trace cmp substrate#54
twof wants to merge 58 commits into
doordash-oss:mainfrom
twof:trace-cmp-substrate

Conversation

@twof

@twof twof commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

No description provided.

twof and others added 30 commits June 12, 2026 09:08
Mutator.mutate becomes (Value, inout FastRNG) -> Value: one mutant per
call, variety from the RNG. Effort per seed (burst size, stacking) now
belongs to the engine, not the mutator: selectForMutation queues a
fixed burst (mutationBurstLength = 16) of single-step mutants, each
mutating one randomly chosen pack position via mutateOnePosition.

Built-in conformances and specialty mutators keep their candidate
enumerations and pick one per call; compose picks a random component;
ScheduleByteMutator picks one strategy per call. Test fixtures that
asserted exhaustive enumeration now assert membership/coverage over
draws (FastRNG is thread-local and unseedable).

Groundwork for the pool scheduler (focus + counter): a selection
becomes a unit of requested work, so burst size and mutation depth can
become scheduler knobs.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Mutation scheduling moves off the plugin bus into a per-engine scheduler
component: fuzz(scheduler: .weightedPool()) owns the pool of interesting
inputs, draws with focus+counter bursts (one fresh generation between
bursts), and composes policy inside it — PoolAdmission decides
membership, child PoolPlugins advise weights/evictions via
owner-mediated actions and hear every membership change (.inserted/
.removed re-broadcast). Children are non-generic: events carry entry
IDs and coverage, never typed inputs, so policies work under any input
pack and schedule fuzzing.

The engine consults the scheduler only when the residual queue (seeds,
queueInputs, bus bursts) is empty; queue semantics are unchanged, so
stopWhenQueueEmpty replay and selectForMutation lineage still hold.
Pool mutants report a new IterationContext.poolParentID — a separate
namespace from parentID (bus-plugin originIDs) on purpose.

corpusMutation and energyMutation are deleted: .weightedPool() with
.everyDiscovery admission and focusOnInsert is the corpusMutation loop,
and the Entropic scoring math (kept, with its characterization tests)
becomes a pool weight advisor next. Default plugins are now empty.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
PoolAdmission.featureOwnership — libFuzzer's corpus model: an accepted
input joins the pool only by owning >= 1 coverage feature (unowned, or
stolen from a strictly larger owner; covered-edge count is the REDUCE
metric, ties don't steal). An entry losing its last feature is evicted
through the same removal path as child evictions, so every policy hears
it. Bounds the mutation pool by the feature space regardless of the
coverage strategy's acceptance rate; rejected accepts get no burst and
no residence (strict semantics).

Admission verdicts can now carry evictions (PoolAdmission.Verdict);
.everyDiscovery is unchanged in behavior.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
EntropicWeightPolicy (PoolPlugin): yield/executions keyed by pool entry
ID off the event stream — .iteration with a pool parent attributes
executions and credits discovered features to the parent's yield (even
when admission rejects the mutant: a rejected discovery is still
information about the parent's neighborhood); .inserted registers the
entry and updates global feature frequencies; .willDraw flushes weights
(rarity terms cached, abundance fresh per draw); .removed entries stop
receiving weights but keep stats for in-flight lineage.

Scoring math unchanged (entropicWeightCombining & co., pinned by the
existing characterization vectors). Sugar: policies: { [.entropic()] }.

This is the composition the bus architecture couldn't express: entropic
selection over a feature-ownership-culled pool.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ckets)

A coverage strategy can now publish the vocabulary the mutation pool
accounts feature ownership in, instead of the ledger always falling back
to bare edge indices:

- CoverageEngine gains an optional features closure, collected inside
  the same gated window as an accepting decide.
- .pathTrie publishes sliding k-grams (default k=2, configurable via
  .pathTrie(gramLength:)) of the ordered first-hit path — PathTrie now
  records the path and judges-and-collects in one critical section.
  Gram hashes are deterministic position-dependent FNV-1a (PathGrams).
- .hitCountBuckets publishes (edge << 8 | bucketBit) pairs.
- Features widen to UInt64 end-to-end (ledger, entropic policy + math);
  PoolIterationOutcome.resolvedFeatures is the single vocabulary every
  pool component reads, falling back to widened edge indices.

Rationale: pathTrie accepts on path novelty but the pool culled on edge
sets, capping retention at one entry per edge (142 on fsub). K-grams
match the retained diversity to the acceptance criterion (measured: 1.1k
features at k=2, decaying accept rate vs pathTrie's flat 37%).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
nil publishes no features at all — the pool falls back to covered edge
indices, the coarsest (most flood-controlling) setting. Needed both as
the user-facing opt-out and to compare vocabularies within one build.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Probe findings on fsub (10s, single engine, culled admission): the k-gram
vocabulary grew the pool 44 -> 500 entries, admission 6% -> 48% of
accepts, burst completion 97% -> 60%, and - the dominant harm - executed
terms drifted 2.2x bigger (wire 174 -> 391), halving iteration
throughput (39k -> 15k). Root cause: vocabulary size IS the population
ceiling (every resident owns >= 1 feature, one owner per feature), so
refining the vocabulary to distinguish inputs better inseparably raised
the cap.

capacity: Int? on .weightedPool() decouples the two: admission still
decides WHO is distinctive in the strategy's vocabulary; the bound
decides HOW MANY stay. Overflow evicts the lowest-weight resident
(ties: oldest; never the newcomer), and REDUCE bankruptcies run first
so they can spare an innocent.

Evicted owners stay GHOSTS: their feature claims persist. The
alternative (releasing claims on eviction) was probed and rejected -
re-opened features made every accept a re-claimant (admission 48% ->
91%, 1.3k evictions/10s, burst completion 34%, throughput down again):
a revolving-door FIFO. A represented feature stays represented; only
genuinely new features or strictly smaller witnesses win residence.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The pool's only size metric was the covered-edge count, which saturates
once coverage does (fsub: median ~100/142 for every entry) — REDUCE ties
never steal, eviction can't see bloat, and mutant-of-mutant term drift is
invisible to every pool mechanism (probed: executed term wire 174 -> 391,
throughput 39k -> 15k iters/10s under a fine vocabulary).

- Mutator gains an optional `size` closure (the workload knows its
  value's real size); compose/combined propagate any component's measure.
- The engine sums measured sizes across the input pack, only on accepted
  runs, into PoolIterationOutcome.inputSize.
- FeatureOwnershipLedger judges REDUCE on the real size when present
  (covered-edge count stays the fallback), so smaller witnesses steal
  features even when coverage counts tie.
- The capacity victim is now lowest weight, then LARGEST measured input,
  then newest — eviction targets the drift monsters directly. Unmeasured
  pools keep the elder-anchoring evict-newest rule; the edge-count proxy
  deliberately never feeds the eviction order (more covered edges mark a
  better entry, not a worse one).

Also fixed in the toolchain fork (3e4ce3824e6): storing an
Optional-of-closure field in a generic struct made SILCombine's witness
devirtualization crash at pack-element call sites (layoutIsTypeDependent
judged enum payloads by unsubstituted interface type).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… opt-in)

10-trial replication killed every 3-trial k-gram win:
- fsub pathTrie kgram+cap64: 90.0%, 2 never-solved vs edges 91.7%, 0
  (the original '93.5%/0 beats baseline' cell was favorable noise)
- stlc pathTrie kgram unbounded: 84.5%/0 vs edges 84.0%/0 (the original
  '91.7 vs 86.7' win was noise; stlc probe shows the configs differ only
  in residence — admission is identical — and stlc merely tolerates the
  drift instead of profiting)

So .pathTrie publishes no vocabulary by default; pathTrie(gramLength:)
remains the opt-in (pair it with the scheduler's capacity: bound —
vocabulary size is otherwise the pool's population ceiling).

hitCountBuckets drops its (edge, bucket) features entirely: that
vocabulary equals its acceptance criterion, so every accept owned a fresh
feature and culling silently turned off (fsub regressed exactly to its
unculled rate, 71.3% vs 86.1% edge-culled). Edge fallback is the working
configuration; a coarser bucket-only vocabulary is deliberately not
pursued after the universal no-benefit result.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Adds the comparison-coverage (cmplog/value-profile) substrate's first layer:
the trace-cmp half of the SanitizerCoverage hooks, mirroring the existing edge
recorder path. SanitizerCoverage's __sanitizer_cov_trace_cmp* hooks deliver the
operands of each instrumented integer comparison, giving a gradient (e.g.
popcount(a^b) as an input nears a boundary `i < c`) that pure edge coverage is
blind to.

- New SanCovCmpRecorder slot on SanCovMeasurementContext (independent of the
  edge recorder slot), with the same attach/reset/release lifecycle.
- __sanitizer_cov_trace_{,const_}cmp{1,2,4,8} + trace_switch capture the call
  PC via __builtin_return_address(0) and route operands through
  sancov_dispatch_cmp to the context's cmp recorder.
- CmpRecorderTests: attach round-trip, slot independence, dispatch routing,
  reset/release lifecycle (7 tests, all green).

Validated separately that real instrumented Swift fires these hooks with usable
operands (standalone probe, edge,trace-cmp): the i<c gradient is exactly the
missing signal for the de Bruijn boundary bugs.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Builds the Swift half of comparison coverage on top of the trace-cmp C hooks.

- ComparisonObserver: the cmp analog of EdgeObserver — a strategy's onCompare
  closure, attached to a measurement context that co-owns it (retain at attach,
  release at last reference). Rides the independent cmp recorder slot, gated by
  the same per-thread observer gate as edges.
- CoverageEngine gains onCompare; makeEvaluator attaches a comparison observer
  when set, routing onReset to whichever observer (edge or cmp) is the sole one.
- comparisonCoverage strategy: records (comparison-site PC, popcount(a ^ b)) per
  comparison (libFuzzer value profile), interesting iff a new such feature OR a
  new edge (union with .newEdge). The Hamming-distance gradient drives mutation
  toward a boundary `i < c` even when the edge set is unchanged. Publishes no
  culling vocabulary (avoids the acceptance==vocabulary tautology, per hcb).

11 new tests (ComparisonObserverTests, ComparisonCoverageStrategyTests), full
suite green.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…oat)

The shift_var_leq experiment (stlc, 20s, 8 trials) shows acceptance-based value
profile over-accepts: solve rate 4/8 vs newEdge's 8/8, despite being a strict
superset of newEdge's acceptance. The cmp operands belong in input-to-state
mutation, not in the acceptance gate. Document the result on the strategy so it
isn't promoted as a default; the trace-cmp substrate remains the foundation for
the I2S mutator that should actually pay off.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The real payoff lever for trace-cmp (the acceptance-based comparisonCoverage
strategy under-performed): use the captured comparison operands to MUTATE inputs
toward satisfying the comparison, the auto-dictionary / RedQueen mechanism.

- ComparisonDictionary: a bounded, thread-safe ring of recently-seen comparison
  operands, published via the `current` task-local. The engine attaches a
  comparison observer that feeds it (when I2S is enabled and the cmp slot is
  free) and binds it around the mutation loop.
- The framework Int mutator samples `current` for both mutate and generate:
  half the draws jump to a recorded operand (or a ±1 neighbour, for `<`/`<=`
  boundary bugs), the rest stay ordinary so I2S guides without starving search.
- Opt-in via the `inputToStateEnabled` task-local (isolated per campaign — no
  process-global race) or the PTK_INPUT_TO_STATE env var (launch-time).
- PropertyTestingKitTests built with trace-cmp so the integration test exercises
  the real cmp hooks: I2S reaches a ~10^14 magic-value bug in 3s that random
  search cannot. Plus unit tests for the dictionary and the Int mutator.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ue axis)

A directional, competitive admission signal on comparison operands, in
contrast to comparisonCoverage's value-profile acceptance (which keeps every
novel distance and bloats — finding 16). Each comparison site (pc) becomes an
ownership feature owned by the input that drove its operands closest together
(lowest |arg1 - arg2|, absolute numeric difference, overflow-safe); a strictly
closer input steals, ties don't, so the owner only ever gets closer and churn
terminates — the value-axis analog of REDUCE. Additive over edge ownership:
one entry roster, evict when owning nothing in either dimension, so the
measurable delta vs featureOwnership is purely the boundary dimension.

- BoundaryDistanceLedger + PoolAdmission.boundaryDistanceOwnership
- CoverageStrategy.boundaryDistance: accepts on newEdge OR a strict per-site
  distance improvement (monotone, not novelty); publishes the run's per-site
  minimum distance for the ledger to cull on
- boundaryDistances channel: CoverageEngine -> CoverageAcceptance ->
  PoolIterationOutcome -> admission (judge generalized to the outcome;
  internal-only signature change, public PoolAdmission API unchanged)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…t fix

Productivity-weighted, adaptive-depth pool policy (two per-seed scores: draw
weight + mutation-depth cascade). Defaults tuned to alpha=0.02, ceiling=45
(swept: doubles solve rate on compound-structure bugs vs the original 0.05/90,
which over-escalated depth into the 0%-productive tail). Eval: best config
overall (1700/1752 with the workload recursive mutators).

- AdaptiveDepthMath/Policy + .setMutationDepth/.inserted(parent,claimed) plumbing
  across PoolPlugin/WeightedPoolCore/Feature+BoundaryDistance ledgers/Entropic.
- chainMutate(depth:) in FuzzStateMachine; .mutate reads per-entry depth.
- SchedulerProbe: per-iteration (source, depth, accepted) hook (zero-cost when
  nil) for productivity/depth/gen-vs-mutation diagnostics.
- FuzzStatsAccountingTests: fix flaky seeds/mutations assertions under full-suite
  cooperative-pool starvation (min(seedCount,total) + guard past the seed phase).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-covered diagnostic

Two SanCov changes from resolving "do failing runs reach full SUT coverage?":

1. Crash fix — sancov_dispatch_cmp had no re-entry guard. A cmp recorder
   compiled into a trace-cmp module (the test target is) fires comparisons in
   its OWN body; each re-entered sancov_dispatch_cmp -> recorder -> ... without
   bound, overflowing the stack (SIGBUS, ~500 frames, confirmed via crash
   report: captureRecorder recursing 254x). Pre-existing (reproduces on clean
   HEAD); CmpRecorderTests crashed deterministically and took the full parallel
   suite down with it. Add tls_in_cmp_recorder (the cmp twin of
   tls_in_edge_observer), set across the recorder call and the cmp reset hook so
   a recorder/hook can never re-dispatch into itself. CmpRecorderTests.dispatch
   test now snapshots+detaches before asserting (the trace-cmp-instrumented
   #expect comparisons would otherwise re-fire the recorder). Full suite green
   3x (475 tests, no signal).

2. Diagnostic — process-global "ever-covered" edge bitmap (g_ever_covered, set
   in sancov_dispatch_edge post-filter, never cleared by the engine's
   per-iteration reset; default-NULL/disabled = one predicted-not-taken load).
   Answers the true executed-edge union of a whole run, which the per-iteration
   context and admitted-only corpus cannot. Swift API on SanCovCounters +
   GlobalEverCoveredTests. Result: stlc hard cells reach 100% SUT-logic coverage
   (54/54) on no-counterexample runs => edge-coverage guidance is saturated; the
   lever for shift_var_leq / subst_abs_no_shift is value-aware, not coverage.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…Ownership)

Adds a third pool-ownership dimension on top of boundary distance: the k-wise
three-valued SIGN combinations over near-boundary comparison sites. Distance is
a per-site gradient that drives the search TOWARD a comparison's flip point;
sign captures which SIDE each near-boundary site landed on — the {<,==,>}
position that edge coverage collapses (== shares the not-taken branch of `a<b`
with >). Witnesses for off-by-one / conjunction bugs need a JOINT state (site A
on its boundary AND site B on a particular side), so the vocabulary is the
pairwise sign combinations (bounded vs the intractable 3^n full product),
discovery-owned so the pool holds partial witnesses and crosses them toward the
conjunction. Motivated by Findings 35/37: on shift_var_leq the boundary A0
(i==c) is abundantly reachable but we never hold (A0,B+) jointly, and coverage
is blind to it.

- .boundaryState(window:maxSites:) strategy: keeps boundaryDistance's gradient +
  edge union, adds sign-combination acceptance + publishing.
- boundaryStateOwnership admission: BoundaryDistanceLedger gains a discovery-
  owned sign dimension (never stolen; a qualitative state has no "closer").
- BoundarySignEncoding: three-valued sign + deterministic (non-Hasher) 1-wise
  and order-independent 2-wise feature hashes; near-boundary windowing + cap.
- Plumbed boundarySigns through CoverageEngine/CoverageAcceptance/
  PoolIterationOutcome/FuzzStateMachine. boundaryDistanceOwnership unchanged
  (passes no signs → dimension inert), so the two arms A/B cleanly.

Full suite 493 tests green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A comparison site hit many times in one run (loop body / recursive
descent) previously contributed only the sign at its closest approach,
dropping the other sides it visited and breaking ties order-dependently.

Track a near-boundary sign MASK per site instead: OR 1<<sign for every
hit within `window` (near-gated in onCompare, so a far loop iteration
never joins the mask). boundarySignFeatures emits one singleton per side
in the mask and the cross-product of sides per pair, so a loop straddle
contributes every near side it touched. min-distance gradient + window
gate unchanged. Perf-neutral, so it hardens the final shape ahead of the
onCompare throughput rework.

497 PTK tests green (+4: loop-straddle, far-side-excluded, mask contract).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Env-gated process-global accumulator that records, per run, the
near-boundary participant count and each site's side-multiplicity, then
reports the current pairwise (k≤2) vocabulary size against the full
subset product (all k) and full width product (k=n). Default-off; one
bool check on the feature-emission path. Measured on stlc/boundaryState:
the full subset product blows up ~1.7e9× (10^11–10^14 features/run,
intractable); pairwise is the right cut. See notebook Finding 40.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Profiling the cmp dispatch (notebook Finding 41) showed the per-comparison
tax was the Swift.Dictionary write into currentRun — SipHash + copy-on-write
ARC (~50% of the cmp channel in release) — NOT the lock (~6%), correcting the
earlier guess. Replace the per-comparison SyncBox<[UInt64: SiteApproach]> with
BoundarySiteAccumulator: a concrete open-addressing PC -> (minDistance,
signMask) map over raw UnsafeMutablePointer buffers (no generics, no Hasher,
no bounds checks/exclusivity, no element ARC), updated via a non-generic
record(), reduced once at decide via snapshot().

The accumulator MUST stay synchronised: coverage contexts are keyed by Swift
task and inherited by child tasks, so a property spawning concurrent work
routes cmp hooks from several threads into one context (the same reason the
edge map uses an atomic CAS and pathTrie locks its trie). Kept a lock but
swapped NSLock -> os_unfair_lock (NSLock's objc_msgSend was ~25% of the now-
small channel).

Release cmp-dispatch self-time: 1473ms -> 543ms = 2.7x (-63%). Behavior
unchanged: 502 PTK tests green (+5 accumulator tests).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
No-GUI CPU profiling for the comparison hot path (Instruments deep-copy is
unavailable headless):
- ProfiledBenchmark: trace-cmp instrumented + a comparison-dense closure driven
  through real fuzz(.boundaryState); PROFILE_STRATEGY/CMP_PER_INPUT/FUZZ_MS knobs.
- scripts/aggregate-time-profile.py: parse `xctrace export` time-profile XML to
  self/total CPU per symbol; --under isolates a subtree's internal breakdown.
  Resolves Instruments' <weight>/<frame> ref= dedup while streaming.
- scripts/record-cmp-profile.sh: build -> xctrace --attach record -> export ->
  aggregate, end to end. Needs full Xcode (DEVELOPER_DIR=Xcode-beta).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ddr 24.5%→14.6%)

On macOS every distinct _Thread_local accessed in a dylib lowers to a
tlv_get_addr function call. The per-comparison hot path (sancov_dispatch_cmp →
get_current_coverage_map) touched ~6 distinct thread-locals, paying ~6
tlv_get_addr per instrumented comparison — profiled at ~24% of the
cmp-dispatch subtree (release).

Coalesce the 10 scattered _Thread_local globals into one _Thread_local
SanCovTLS struct + a sancov_tls() accessor. Each hot entry point fetches the
block address ONCE and threads `ts` down through the routing helpers
(get_current_coverage_map / set_tls_measurement_context /
get_current_task_for_measurement / ensure_tls_coverage_map all take `ts`), so
callees never re-fetch. Pure storage refactor: identical routing logic, same
atomics, same locks, per-thread by construction (no new sharing).

Measured (release, boundarystate, CMP_PER_INPUT=256): tlv_get_addr self-time
24.47% → 14.64% of the cmp subtree (−41%). The residual is the one unavoidable
TLS fetch per comparison.

Also fix scripts/aggregate-time-profile.py: it resolved <frame> and <weight>
ref= dedup but not <backtrace ref=>, so back-referenced sample rows (95%+ of a
hot loop's samples) got empty stacks and attributed to nothing — manufacturing
a phantom "96% unsymbolicated". Now resolves all three ref levels; the same
trace attributes 99.9% to the runEngines branch.

Tests: SanCovTests 39 + ScheduleControlTests 32 + PropertyTestingKitTests 502
all green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ead 61.5%→40.5% of process)

The per-comparison cmp-dispatch path was dominated by two costs after the TLS
coalescing: the os_unfair_lock taken per record() (~26% of the cmp subtree, an
out-of-line libsystem call) and a retain/release pair per comparison in the
recorder bridge (~11%, RefCountBitsT atomics).

LOCK → LOCK-FREE (BoundarySiteAccumulator): the lock was only required because
grow()'s realloc could race concurrent readers (task-inherited child tasks route
cmp hooks from several threads into one accumulator). Make the table
fixed-capacity (default 8192, far above any real workload's distinct cmp-site
count; drops + sets didOverflow if it ever fills) so there is no realloc, then
update each slot with per-slot atomics: claim via key CAS, distance via a
load-then-weak-CAS min that early-outs with no RMW when the distance doesn't
improve (the steady state), sign via atomic OR. Occupied-slot indices are tracked
in a side list so snapshot/reset stay O(occupied). A straggler record racing
reset/snapshot is memory-safe (fixed buffer) and at worst loses its own unwanted
late write.

ARC: the recorder bridge did Unmanaged.takeUnretainedValue().onCompare(...) per
comparison, which the compiler brackets with a retain/release pair. Switch to
_withUnsafeGuaranteedRef — sound because the context co-owns the observer and is
alive for the whole call. Helps every cmp strategy (boundary, I2S,
comparisonCoverage).

Measured (release, boundarystate, CMP_PER_INPUT=256): cmp-dispatch as a fraction
of whole-process CPU dropped 61.5% → 40.5%. os_unfair_lock and the ARC refcount
atomics are gone from the cmp subtree; the residual is the correctness-required
task-keyed routing plus genuine accumulator work.

TDD: 503 PropertyTestingKitTests green, including a new concurrent-records
lock-free-safety test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… analysis

Adds a diagnostic that answers "is comparison volume concentrated in a few
filterable sites, or is it the relevant SUT comparisons themselves?". When
PTK_CMP_CENSUS=<path> is set (checked once in a constructor), sancov_dispatch_cmp
records per comparison-site PC: fire count and min |arg1-arg2|, into a fixed
lock-free open-addressing table; dladdr-symbolized and dumped atexit. Zero
production cost when unset (one predicted-not-taken atomic load of g_cmp_census,
same pattern as g_ever_covered).

Used to settle the "measure fewer comparisons" lever (notebook Finding 41g): on
the stlc SUT, 57% of comparison volume is Swift array bounds checks
(_checkIndex/count on the de Bruijn [Typ] context), 31% genuine SUT-logic, and
only ~12% is safely-droppable (generator/enum-equality/value-witness). The
distance-approach filter is dead (most sites reach distance 0). So the cmp
throughput cost is mostly intrinsic to measuring the value signal.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… strategies

Drops synthesized/stdlib comparison sites from the trace-cmp hot path so the
per-comparison dispatch tax concentrates on SUT-logic comparisons that actually
witness bugs. Classifier sancov_cmp_should_drop() flags stdlib methods (Swift
module / standard-substitution types — Array bounds checks, count getters,
buffer copies), synthesized Equatable (__derived_enum_equals), value witnesses,
and everything sancov_is_compiler_generated already flags (outlined/metadata).
Keeps user-module SUT functions. Verdict cached per comparison-site PC (dladdr +
classify on first fire, relaxed-load lookup thereafter); opt-in via env, default
off = one predicted-not-taken load. Counter reports distinct dropped sites via
on-demand scan, never per-comparison (a contended RMW there halved throughput).

Measured on stlc: 89% comparison volume dropped (360.9M->39.5M), only SUT-logic
symbols survive; boundarystate throughput +1.57x (412k->646k tests/6s). On the
shift_var_leq cell the bug-witnessing `i < c` comparison is kept, so guidance is
preserved; narrows but does not close the gap to newedge (which pays no cmp tax).

503 PTK tests green; new SanCovCmpDropTests covers the classifier (red->green).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Synthesized/stdlib comparison sites carry no SUT signal; taxing them only slows
the trace-cmp value-aware strategies. Flip PTK_CMP_DROP_SYNTHESIZED to default-on
(measured +1.3-1.6× boundarystate throughput on stlc); opt out with =0 when a bug
can manifest as a value at a stdlib bounds-check comparison. Hot-path branch hint
flipped to expect-enabled. 503 PTK tests pass (the lone GlobalEverCovered failure
is the pre-existing edge-bitmap cross-test flake — passes in isolation, unrelated
to the cmp path).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…on drops)

The profile (Finding 41i) showed macOS thread-local access (tlv_get_addr +
pthread_getspecific + sancov_tls) is ~34% of the process and the single biggest
floor. The drop check needs only the pc argument and the global table (a plain
atomic load) — not the thread-local block — so moving it ahead of sancov_tls()
lets every DROPPED comparison return without paying a tlv_get_addr. On stlc that
skips ~320M TLS fetches/run (89% of comparisons drop). Safe ahead of the re-entry
guard: the drop check fires no instrumented comparisons (SanCovHooks/libc aren't
trace-cmp instrumented), a dropped site never reaches the recorder, and kept
sites still hit the guard. stlc boundarystate ~638k vs ~446k tests/6s opt-out
(1.43×). 503 PTK tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…oughput)

The heaviest-trace profile (Finding 41k) showed boundaryState's once-per-iteration
decide closure was the #1 cost cluster (~14%), allocation-bound: it rebuilt a
perSite dictionary from the sites array just to feed boundarySignFeatures, which
itself re-allocated the near-site selection, the features array, and a fresh
sides() [UInt8] per site in nested loops — every iteration.

Refactor: boundarySignFeatures gains an allocation-light core that reads sign
mask + distance straight from the sites array into reused inout buffers (features
+ a near-site scratch kept in DistanceState), and walks the ≤3 mask bits inline
via @inline(__always) forEachSide (no per-site array). The dict-keyed signature
stays as a thin wrapper for tests/non-hot callers. The engine drops the perSite
dict entirely.

Result (stlc boundarystate, release): throughput ~638k → ~1.03M tests/6s = 1.6×;
malloc/free fell out of the whole-process top 10 (was ~8.3% combined). 505 PTK
tests pass (2 new: array-core parity with the dict reference + buffer reuse).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… bitmap

The edge-union novelty oracle ("has any run ever hit this edge?") was a
Set<UInt32> in every strategy (.newEdge, .boundaryDistance/State, .comparison-
Coverage), inserted per covered edge per iteration. SanCov edge indices are dense
and bounded by the guard count, so a packed bit array (EdgeUnionBitmap) gives the
same Set.insert(_:).inserted answer in O(1) with no hashing and no per-insert
allocation after warm-up.

Profile-confirmed: for .newEdge (pure edge union) Set.insert + Hasher are now
GONE from the whole-process top 10 — its hot path is purely SanCov/routing.

Honest scope note: this barely moves .boundaryState, whose dominant Set is NOT
seenEdges but seenSigns (the Set<UInt64> of sign features — high volume from the
pairwise cross-product). seenEdges was the minor share. That is a separate lever
(seenSigns can't be bitmapped — sparse 64-bit hashes). The bitmap is still strictly
cheaper than the Set everywhere and never worse.

509 PTK tests pass (4 new EdgeUnionBitmap tests; the lone EntropicPolicy failure
is the known RNG-tie flake — fails ~1/3 in isolation, unrelated). Workloads need
`rm -rf .build` to pick up the new file (SwiftPM stale-plan gotcha).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…6pt of CPU)

The heaviest-trace profile (Finding 41n) showed the boundaryState decide cost was
seenSigns — a Set<UInt64> of sign-combination features inserted per iteration —
costing Set.insert 4.16% + Hasher 2.56% ≈ 6.7% of the process. But those feature
keys are ALREADY splitmix64-mixed hashes (encodeBoundarySign1/2), so Set<UInt64>
re-hashed uniform bits with SipHash for nothing.

FeatureHashSet: open-addressing UInt64 membership keyed on the value's own
(pre-mixed) low bits — no Swift Hasher — with the Set.insert(_:).inserted
contract and a separately-tracked literal 0 (the empty-slot sentinel). Same trick
BoundarySiteAccumulator uses for PC keys. Swapped into seenSigns
(.boundaryState/.boundaryDistance) and seenFeatures (.comparisonCoverage).

Re-profile (stlc boundarystate): Set.insert + Hasher GONE; FeatureHashSet.insert
is 1.11% (down from ~6.7%). Throughput ~1.0M → ~1.18M tests/6s. TLS routing
(tlv_get_addr) is again the clear top cost; the decide-side data structures are
now cheap. 513 PTK tests pass (4 new; lone EntropicPolicy fail is the known
RNG-tie flake). Workloads need `rm -rf .build` for the new file.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
twof and others added 17 commits June 15, 2026 15:48
The generator/mutator runs instrumented SUT code (a type-directed
generator calls getTyp; mutators validate mutants the same way), but
that coverage is NOT the property under test — it is reset away before
the test runs. Dispatching and recording it (routing + first-hit +
BoundarySiteAccumulator.record) was ~25% of the process on stlc.

Add a per-thread `suppressed` flag in SanCovTLS; the dispatch_edge and
dispatch_cmp hooks early-return when it is set. The fuzz loop sets it
around the straight-line input-production block (no await, no thread
hop) and clears it before resetCoverage so the test is always measured.
Per-thread, so a mutating engine can't suppress a concurrently-testing
one.

Profile (stlc shift_var_leq, release): generateMutation subtree drops
33% -> 22% inclusive; get_current_coverage_map / record_first_hit /
BoundarySiteAccumulator.record vanish from it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
SyncBox (an NSLock-backed test utility) had leaked onto the per-dispatch
observer path. Env-gated PTK_LOCK_METRICS instrumentation measured the
damage on stlc/SinglePreserve (synchronous SUT, so contention was 0 — the
cost is the uncontended acquisition itself):

  hitCountBuckets.state  14,272,900 acquisitions / 20,000 tests = ~714/test
  boundaryDistance.state  1,386,368 / 1,385,000              = ~1/test

Rewrites (all mirror the lock-free BoundarySiteAccumulator: fixed-capacity
flat per-slot atomics, claimed-index O(occupied) drain):

- HitCountAccumulator replaces the per-edge SyncBox in
  HitCountBucketsStrategy.onEdge. Throughput 20k -> ~565k tests/6s (~28x).
- AtomicFeatureSet replaces the per-comparison SyncBox in
  ComparisonCoverageStrategy.onCompare.
- ComparisonDictionary's OSAllocatedUnfairLock ring -> a flat atomic ring
  with a monotonic atomic cursor (I2S record path, per comparison).
- UncheckedBox (SyncBox's API minus the lock) replaces the per-iteration,
  decide-only SyncBoxes in newEdge / signatureMatch / boundaryDistance /
  pathTrie. The engine-lifetime halves moved to single-thread holders
  (decide is serialized per engine; observers never touch them).

Kept as SyncBox: boundarySign.diag (a file-scope global shared across
engines, diagnostic-only). LockMetrics kept (env-gated, zero-cost off) for
future audits. PathTrie.advance's NSLock is deferred (ordered mutable trie,
not a flat accumulator) — tracked in doordash-oss#46.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Edge-only strategies (newEdge / hitCountBuckets / pathTrie /
signatureMatch) attach no comparison recorder, yet the trace-cmp hooks
still fired sancov_dispatch_cmp ~33-35M times / 6s — each paying
sancov_tls() + get_current_coverage_map() — and ZERO reached a consumer
(measured via the new env-gated PTK_DISPATCH_COUNT counters). That was
~43% of all dispatch-TLS, pure waste (Finding 43).

Add a process-global g_cmp_recorder_count, adjusted on the 0<->nonzero
transition in sancov_context_set_cmp_recorder (exchange the old bits) and
in end_measurement (sever). sancov_dispatch_cmp now early-returns after
the drop filter but BEFORE the TLS fetch when no recorder is attached
anywhere (and the census is off) — both are plain global loads, no TLS.
Race-free: the count moves only at measurement setup/teardown. A mixed
run with any cmp-consuming engine keeps count > 0, so a real consumer is
never suppressed.

Result: newEdge/hitCountBuckets cmp_dispatches 33-35M -> 0; newEdge clean
throughput ~+8.5%; boundaryState unchanged (consumes every comparison).
TDD: SanCovCmpRecorderGateTests asserts the count tracks attach / re-attach
/ clear / end_measurement. 48 SanCov + 75 strategy/routing tests green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reorder sancov_dispatch_cmp so the process-global cmp-recorder gate
(g_cmp_recorder_count == 0 && census == NULL) runs FIRST, before the
synthesized/stdlib drop filter. Edge-only strategies (newEdge /
hitCountBuckets / pathTrie / signatureMatch) attach no cmp recorder, so
they now skip both cmp_drop_should_skip (~6%, Finding 43) and the TLS
fetch on every kept comparison — the gate is two plain global atomic
loads, the cheapest possible early-out.

Behavior is unchanged: when count==0 the hook already did nothing
observable; this only makes that path cheaper. Confirmed by profiling
newedge — cmp_drop_should_skip drops entirely out of the
sancov_dispatch_cmp subtree. Guarded by existing
SanCovCmpRecorderGateTests + SanCovCmpDropTests (7 green).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Newer Swift toolchains conform integer types to the stdlib's
AtomicRepresentable (Synchronization) in addition to swift-atomics'
AtomicValue, so the bare `UInt64.AtomicRepresentation` becomes ambiguous
and the build breaks after an Xcode/SDK bump — with no change to our code
or the swift-atomics pin. `AtomicRep<T>` constrains the lookup to
swift-atomics' AtomicValue, resolving it unambiguously. Applied across the
lock-free accumulators that allocate flat atomic-storage buffers.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… walk

Task doordash-oss#49: an instrumented SUT value-`destroy` fires a coverage edge while a
FuzzResult's Corpus is torn down inside a cooperative worker's ~AsyncTask.
`g_coverage_inheritance_key` is process-global and never cleared, so routing
walks the dying task's task-local chain; its head (task+136) is freed and
poisoned (in `sancov_is_valid_pointer`'s coarse range but unmapped), so the
raw memcpy faults -> SIGSEGV. (This is the real story behind the old
"cross-session g_target_context UAF" guess.)

Fix, pinned with a real .ips + live lldb (which refuted a first count-gate
hypothesis: g_active_ctx_count was 2 at the fault, not 0):
1. Runtime-authoritative gate: the runtime's own swift_task_localValueGet is
   task-state-aware and returns 0 without faulting on a dying task. Track
   whether it ran and only fall back to the manual chain walk when it did
   NOT — removing the fragile walk from the common path (also a perf win).
2. Fault-safe reads: safe_read() wraps vm_read_overwrite(mach_task_self(), ..)
   so the residual fallback can't fault either — a poisoned chain just ends
   the walk (= no inherited context).

Verified by a TDD regression (mmap+munmap a page = valid-range-but-unmapped
poison at task+136, expect the seam returns 0, no fault; deterministic
SIGSEGV pre-fix) and 3000 lldb relaunch-until-crash iterations clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…y flake

EntropicPolicyTests asserted on weighted-draw outcomes that depend on
FastRNG, which is unseedable — so ties broke nondeterministically and the
suite flaked. Rather than seed FastRNG (it stays a zero-dispatch thread-local
shim on the hot path), follow swift-dependencies' own withRandomNumberGenerator
pattern: a `\.fastRandomNumberGenerator` dependency wrapping FastRNG via
Point-Free's WithRandomNumberGenerator. WeightedPoolCore resolves it once in
init and draws scalars inside its @sendable closure (capturing only locals,
never self). Tests override it with a seeded DeterministicRNG (SplitMix64) for
reproducible draws.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The A/B (Findings 45-48) showed the boundary-sign vocabulary bought zero
bug-finding over the raw distance gradient, so remove it entirely: delete
BoundarySignEncoding + the boundaryState strategy and its tests, and drop the
boundarySigns field threaded through the acceptance/pool path. The
accumulator now stores ONLY the per-site minimum |arg1 - arg2| — one atomic
word per bucket, no packing, no sign, no near-window.

Hot-path perf on the per-comparison cmp channel:
- absoluteDifference is branchless: wrap-once + conditional negate lowers to
  subs+cneg (2 instr) vs the two-subtraction ternary's sub+subs+csel (3).
- record() is @inline(__always). The module builds non-WMO (one .o per
  file), so without it record stayed an out-of-line cross-file tail-call from
  onCompare. It has a single hot caller, so folding it in (with its already-
  inlined hash/probe/updateSlot helpers) costs no code size and makes the
  whole per-comparison path a single call-free leaf — verified by disasm.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- project.pbxproj: add AtomicRep.swift / DeterministicRNG.swift, drop the
  deleted BoundarySignEncoding + boundary-sign/state test files, so the
  project keeps building from Xcode.
- Package.resolved: remove the now-unused package-jemalloc pin.
- open-instruments.sh: launch the benchmark ourselves with the patched
  runtime on DYLD_LIBRARY_PATH and attach xctrace (the Instruments GUI
  "Choose Target" path can't set it, so the binary aborts on
  _swift_coroFrameAlloc); add an optional time-limit arg.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…mentation

Move EmitCmpTrace + TagCompilerGenerated from /tmp into LLVMPasses/, built by
scripts/build-llvm-plugins.sh (auto-run at the top of build-local-toolchain.sh)
against the patched toolchain's LLVM. EmitCmpTrace emits the trace_cmp callbacks
we want (dropping trap-guard comparisons); TagCompilerGenerated tags compiler-
generated functions NoSanitizeCoverage. Dylibs build to .build/llvm-plugins
(gitignored) since they link this toolchain's LLVM.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Package.swift computes the plugin dylib path from #filePath and a loadPass()
helper, then loads TagCompilerGenerated (and EmitCmpTrace for the cmp targets,
replacing stock -sanitize-coverage=…,trace-cmp) on every instrumented target.
TagCompilerGenerated loads first so EmitCmpTrace skips tagged functions. Builds
green with the runtime filters still present (they become near-no-ops).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The TagCompilerGenerated/EmitCmpTrace plugins filter compiler-generated edges and
trap-guard comparisons at compile time, so the runtime filters are dead code.

Remove from SanCovHooks (2413->2029 lines): the lazy edge filter
(sancov_apply_edge_filter, sancov_is_compiler_generated, g_edge_state, the
first-fire classify + on-disk cache + atexit, and the per-edge consult in
__sanitizer_cov_trace_pc_guard) and the cmp drop filter (sancov_cmp_should_drop,
cmp_drop_should_skip, g_cmp_drop_table, cmp_drop_init, PTK_CMP_DROP_SYNTHESIZED).
Drop the header decls, SanCovCounters.applyEdgeFilter/filteredEdgeCount, and the
FuzzEngine.run call. The hot path loses a per-edge g_edge_state load and a per-cmp
drop-table probe; the compiler-generated classifier now lives only in the plugin.

Delete the now-obsolete SanCovEdgeFilterTests + SanCovCmpDropTests, trim
PCResolutionTest's classifier tests, and remove the applyEdgeFilter() calls from
the determinism tests. Suite green (SanCovTests 37, ScheduleControlTests 32,
PropertyTestingKitTests 503, GenericTimerPollerTests 26); determinism 10/10.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CoverageStrategy.compose([...]) / .combined(with:) unions N strategies into one
engine: every substrategy's onEdge/onCompare/onReset runs, every decision runs
(no short-circuit, so each updates its own novelty oracle) and the results are
OR-ed. Pool vocabularies are merged — features namespaced per substrategy via a
SplitMix64 finalizer so two strategies' raw values can't collide in the shared
ownership space, boundaryDistances merged per site by the closer value.

Add .boundaryDistanceOnly: the comparison channel without the edge-coverage
union, so composing it with an edge strategy (e.g.
.pathTrie.combined(with: .boundaryDistanceOnly)) doesn't double-count edges.
Plain .boundaryDistance is unchanged (== .newEdge unioned with this).

The admission side composes for free: a composed engine publishes both the
namespaced features and the merged distances, and PoolAdmission
.boundaryDistanceOwnership already culls over resolvedFeatures + boundaryDistances
(featureOwnership covers the edge-only case). No new ledger needed.

4 composition tests; full suite 507 green. (Requires the SUT built with the cmp
channel for the boundary axis to fire.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…Discovery

The library default for `MutationScheduler.weightedPool(admission:)` was
`everyDiscovery` — every strategy-accepted input joins the pool and nothing
ever leaves. On stlc that floods the pool with ~2400 entries whose median
wire size is 421 chars (max 1469); mutating one of those giants rarely lands
on the relevant node.

Flip the default to `featureOwnership` (libFuzzer-style REDUCE): each feature
is owned by the smallest witness, larger owners are evicted. Measured on the
clean stlc baseline this collapses the live pool to ~20 entries of median
size 49, and the mean *executed* term shrinks 5x (323 -> 65 chars).

On the hard de Bruijn mutant shift_var_leq the flip finds the bug 20/20 at
median 4.0s vs everyDiscovery's 17/20 at 6.7s — better detection AND speed,
the direct payoff of smaller, better-targeted mutation parents.

Callers that want the old keep-everything behavior still pass
`admission: .everyDiscovery` explicitly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…urce)

Faithful re-port of PR doordash-oss#54 (trace-cmp-substrate, tip 06d5799) onto post-doordash-oss#45
main. Source compiles; tests follow in the next commit.

- SanCovHooks C: trace-cmp hooks (sancov_dispatch_cmp, cmp recorder slots +
  lifecycle, in_cmp_recorder re-entry guard, TLS coalescing, cmp drop filter,
  lock-free cmp accumulator, dispatch suppression) grafted onto main's file via
  3-way merge, preserving main's inheritance-walk fix.
- Seam: CoverageEngine gains onCompare/boundaryDistances; CoverageStrategy
  attaches a ComparisonObserver and captures boundaryDistances; CoverageProbe
  threads boundaryDistances through CoverageVerdict and now hosts the I2S
  dictionary (it owns the measurement context).
- Strategies: comparisonCoverage, boundaryDistance(+Only), compose/combined;
  lock-free AtomicFeatureSet/HitCountAccumulator/BoundarySiteAccumulator/
  EdgeUnionBitmap/FeatureHashSet; AtomicRep + UncheckedBox + LockMetrics.
- Scheduler: PoolIterationOutcome.boundaryDistances; PoolEvent.inserted gains
  parent/claimed; PoolAction.setMutationDepth; WeightedPoolCore chains mutation
  depth in next() and judges on the whole outcome; BoundaryDistanceLedger +
  boundaryDistanceOwnership admission; AdaptiveDepthPolicy/Math; SchedulerProbe.
- I2S: ComparisonDictionary (moved to FuzzCore) + Int mutator hook; opt-in via
  PTK_INPUT_TO_STATE / task-local.
- Build: in-repo LLVM pass plugins (EmitCmpTrace, TagCompilerGenerated) replace
  the runtime edge/cmp filters; Package.swift loads them via -load-pass-plugin.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_014mrEZMehSXEHXv6vvGvzsP
…wo grafts

Brings in the branch's trace-cmp tests and reconciles the existing suites with
the post-doordash-oss#43/doordash-oss#44/doordash-oss#45 architecture. All three targets green: PropertyTestingKitTests
532, SanCovTests 37, ScheduleControlTests 32 (601 total, 0 unexpected failures).

Test ports (assertions preserved, API surface only):
- Strategy tests use the non-generic CoverageEvaluator (evaluate(context, client))
  and read CoverageAcceptance.boundaryDistances.
- Scheduler tests drive WeightedPoolCore via the generic init / WeightedPoolHarness
  (generationRatio replaces burstLength/focusOnInsert), decide() not next()-directive,
  and the 5-tuple PoolEvent.inserted(parent:claimed:).
- EntropicPolicyTests/.inserted bumped to the 5-tuple; InheritanceTest/FuzzStats kept
  at main's versions; SanCovEdgeFilterTests deleted (runtime filter is compile-time now).
- New suites: Cmp/ComparisonObserver/GlobalEverCovered, AdaptiveDepth*, Atomic/HitCount/
  EdgeUnion/FeatureHashSet/BoundarySite accumulators, ComparisonDictionary, Fuzz/IntInputToState,
  LockMetrics, SanCovCmpRecorderGate/Suppression; DeterministicRNG support.

Two source grafts completed (missed in the source commit):
- LockMetrics moved to FuzzCore (SyncBox, its consumer, lives there) + SyncBox's
  metrics/acquire()/forceMetrics integration grafted onto main's FuzzCore SyncBox.
- chainMutate restored as a public SchedulerSupport helper; WeightedPoolCore.next()
  calls it (preserves AdaptiveDepthChainTests' direct assertion).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_014mrEZMehSXEHXv6vvGvzsP
xcodegen generate to pick up the new trace-cmp sources/tests and the
LockMetrics/ComparisonDictionary/AtomicRep moves into FuzzCore. project.yml
keeps -sanitize-coverage=edge,pc-table without -load-pass-plugin: as on the
source branch, the LLVM pass plugins are wired for the CLI build only
(Package.swift + build-llvm-plugins.sh), so the Xcode build is edge-coverage
only and the cmp channel / I2S are CLI-only — the branch's existing design.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_014mrEZMehSXEHXv6vvGvzsP
@twof twof force-pushed the trace-cmp-substrate branch from 06d5799 to a4927a1 Compare June 19, 2026 02:04
@twof

twof commented Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

Structurally re-ported onto post-#45 main

This branch was force-updated from a structural re-port of the original 45-commit stack onto the rearchitected main (scheduler-owns-pool / FuzzCore split / CoverageProbe seam / UInt64 features). The prior tip is preserved at trace-cmp-substrate-prerebase (06d5799).

What's included (full faithful re-port):

  • trace-cmp substrate — C cmp hooks (sancov_dispatch_cmp, per-context cmp recorders + lifecycle, in_cmp_recorder re-entry guard, TLS coalescing, drop filter, lock-free cmp accumulator), grafted onto main's SanCovHooks.c preserving its inheritance-walk fix; ComparisonObserver bridge.
  • seamCoverageEngine.onCompare/boundaryDistances; CoverageProbe/CoverageVerdict thread boundaryDistances (mirroring features) and host the I2S dictionary.
  • strategiescomparisonCoverage, boundaryDistance(+Only), compose/combined; lock-free accumulators (Atomic/HitCount/BoundarySite/EdgeUnion/FeatureHashSet).
  • schedulerboundaryDistanceOwnership admission, AdaptiveDepthPolicy, setMutationDepth + depth chaining, SchedulerProbe.
  • I2S mutation (the measured win on direct-dataflow/magic-value bugs) — ComparisonDictionary + Int mutator hook, opt-in via PTK_INPUT_TO_STATE.
  • build — in-repo LLVM pass plugins (EmitCmpTrace, TagCompilerGenerated) replace the runtime edge/cmp filters.

Eval status (unchanged from the original branch): the cmp acceptance strategies (comparisonCoverage, boundaryDistance, adaptive-depth) remain documented-negative baselines — edge coverage still wins; they are not defaults. The substrate + I2S are the keepers.

Tests: PropertyTestingKitTests 532, SanCovTests 37, ScheduleControlTests 32 — all green (per-target; the full-parallel run has the pre-existing #49 sancov flake).

Not benchmarked (per the agreement on the #45 stack).

Xcode note (pre-existing design): project.yml keeps -sanitize-coverage=edge,pc-table without -load-pass-plugin — the LLVM passes are wired for the CLI build only (Package.swift + build-llvm-plugins.sh). So the Xcode build is edge-coverage-only; the cmp channel + I2S are CLI-only, exactly as on the source branch.

🤖 Generated with Claude Code

twof and others added 11 commits June 18, 2026 19:26
…clock flake)

PoollessSchedulerTests.poollessSchedulerRuns drove a 0.3s wall-clock fuzz and
asserted totalInputs > 0. Under the full 120-way parallel suite, probe setup +
campaign-scope install can exhaust the 0.3s budget before the first iteration
(the loop forces a time check on iteration 1, FuzzStateMachine.swift:199/211),
so the run executes zero inputs and the expectation fails. This was the lone
non-known issue in the otherwise-green full-parallel run.

Fix mirrors the committed FuzzStatsAccountingTests cure: a generous wall-clock
ceiling plus a deterministic stopAfter(N) iteration-counter plugin, so
"the pool-less scheduler drives the engine" holds regardless of machine load.
Verified 8/8 clean full-parallel runs (632 tests, 0 real failures).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_014mrEZMehSXEHXv6vvGvzsP
…uator/Ledger)

First step of the feedback-agnostic refactor. Introduces the metric-agnostic
ownership primitive and the two evaluators that hold the ownership criterion:

- OwnershipLedger: feature(domain,id) -> owner roster, reassignment + eviction,
  sequential entry IDs. Knows nothing of size/distance — it records claims an
  evaluator already decided.
- EdgeOwnershipEvaluator: REDUCE (smallest input owns an edge; ties don't steal).
- BoundaryDistanceEvaluator: lowest |arg1-arg2| per pc owns; ties break toward
  the smaller input (a refinement over BoundaryDistanceLedger, which never broke
  ties).

These are the decomposed halves of today's FeatureOwnershipLedger and
BoundaryDistanceLedger; not yet wired (additive). 10 new tests green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_014mrEZMehSXEHXv6vvGvzsP
…lel cmp path

Phase 1c. The admission no longer owns the ownership *criterion* — it composes
the decomposed pieces:

- featureOwnership now runs EdgeOwnershipEvaluator + BoundaryDistanceEvaluator
  and records their claims in the generic OwnershipLedger. The boundary
  evaluator is inert unless the strategy publishes boundaryDistances, so this
  one admission subsumes the former boundaryDistanceOwnership: add the cmp
  signal and the same admission culls over it.
- Deleted FeatureOwnershipLedger, BoundaryDistanceLedger, and the
  boundaryDistanceOwnership admission (all subsumed).
- cmp is no longer a parallel ownership mechanism — it's a peer evaluator
  feeding the same roster as edges.

Test suites migrated to compose evaluator+ledger; the unchanged pool-level
admission suites (now pinned to .featureOwnership) validate behavior is
preserved. 543 PropertyTestingKitTests pass, 0 non-known failures. xcodeproj
regenerated for the deleted/added files.

Note: boundary ties now break toward the smaller input (a refinement; the old
ledger never broke boundary ties). The boundaryDistances signal carrier on the
verdict stays until Phase 3 gives cmp its own probe.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_014mrEZMehSXEHXv6vvGvzsP
…tation providers

Phase 2 of the feedback-agnostic refactor. Coverage is no longer a privileged
top-level fuzz() knob; it is the provider the chosen scheduler vends.

- SchedulerFactory gains makeProviders() -> [any InstrumentationProvider]
  (default []). The engine installs only providers whose key matches the
  scheduler's requiredProbes, so a pool-less / cmp-only / exotic scheduler pays
  for nothing it didn't ask for.
- MutationScheduler becomes a namespace (caseless enum); weightedPool(...) gains
  a coverageStrategy: param and returns a concrete WeightedPoolFactory that
  vends the CoverageProvider. (Dropping the wrapper struct also sidesteps a
  patched-toolchain parameter-pack parser fault that fired when a forwarding
  member sat beside the pack-generic makeScheduler.)
- fuzz()/regress() and FuzzEngine convenience drop coverageStrategy; runEngines
  builds providers via scheduler.makeProviders(). Replay forces .alwaysInteresting
  by passing weightedPool(coverageStrategy: .alwaysInteresting).
- Call sites migrated: `coverageStrategy: X` -> `scheduler:
  MutationScheduler.weightedPool(coverageStrategy: X)`. TestHelpers' helpers now
  take `scheduler:` to mirror the new API.

543 PropertyTestingKitTests pass, 0 non-known failures; full build (incl.
benchmarks) clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_014mrEZMehSXEHXv6vvGvzsP
…ger (Phase 3)

Demonstrates the payoff of the Signal/Evaluator/Ledger decomposition: the
scheduler carries the comparison signal itself (.boundaryDistance) — vends the
cmp-recording provider and culls over the comparison-distance axis through the
same OwnershipLedger as edges, with the boundary evaluator a peer of the edge
evaluator. The engine names no signal.

Scope note (in the test): a *pure* cmp-only run (.boundaryDistanceOnly, no edge
axis) additionally needs an instrumented comparison in the SUT (a bare stdlib
`Int ==` lowers to the uninstrumented stdlib at -Onone, emitting no trace_cmp
in-target) and a corpus retention signature that isn't SparseCoverage. Those are
the remaining Phase-3 items; this pins the wired-and-green part. 544 tests green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_014mrEZMehSXEHXv6vvGvzsP
…o compiler bug)

Phase 2 made MutationScheduler a caseless enum to dodge a "pack expansion ...
can only appear in a variadic type" error I attributed to a patched-toolchain
parser fault. That attribution was WRONG: a clean build of the wrapper-struct
design (struct: SchedulerFactory wrapping `any SchedulerFactory`, forwarding
makeScheduler + makeProviders, with a static weightedPool returning Self)
compiles fine and 544 tests pass. The earlier failures were incremental-build
staleness during rapid edit->build cycles, not a compiler bug — five reduced
single-file variants all compile clean, and the full clean build of the real
wrapper does too.

Restores the wrapper struct (the design originally intended): keeps the "vend
any scheduler, not only the weighted pool" promise as a value type rather than a
namespace. weightedPool returns MutationScheduler again; call sites use it as
any SchedulerFactory unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_014mrEZMehSXEHXv6vvGvzsP
…et (B′)

The corpus used to make the scheduling decision — it checked each input for
novel coverage and saved it. That is no longer its job: the scheduler's
ownership ledger owns retention. So stop maintaining the corpus during the run
and build it once, after the loop, from what the scheduler vends.

- AnyScheduler.observe returns Void (folds signals into the scheduler's own
  state); new AnyScheduler.snapshot() -> [(repeat each Input)] is read once by
  the engine at run-end to build the corpus. WeightedPoolCore.snapshot vends its
  live pool (evicted entries already gone); a pool-less scheduler vends [].
- Corpus is no longer coverage-keyed: delete CorpusEntry.sparseCoverage,
  Corpus.signatures / mergeCoverageAndAdd / addIfInteresting, and
  SubmitToCorpusAction.sparseCoverage; add() drops its sparse: param. Plugin
  failures still append during the run via addToCorpus, just without coverage.
- Cross-engine mergeCorpusSnapshots dedups by encoded input bytes (CorpusEntry
  already encodes exactly the input array) instead of coverage signatureHash;
  made internal so the dedup contract is unit-tested.

On-disk format is unchanged — coverage was never serialized. 545
PropertyTestingKitTests green, including two new pins: the engine builds the
corpus from snapshot() (not observe), and cross-engine merge dedups by input.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_014mrEZMehSXEHXv6vvGvzsP
…mitToCorpus

Make the corpus a value materialized from the scheduler at run-end, with no
mid-run writers at all — the engine holds no corpus during a run.

- Remove the submitToCorpus action + SubmitToCorpusAction. The shrinking plugin
  no longer persists the minimized failing input; it still minimizes, biases
  mutation (selectForMutation), and records the issue. Failure retention for
  regression is tracked separately (doordash-oss#55).
- With no mid-run writers, drop FuzzStateMachine's held Corpus and its ctor
  param. FuzzStateMachineResult.corpus is now a CorpusSnapshot built at run-end
  from scheduler.snapshot(). FuzzEngine drops corpusRegistry.getCorpus() and the
  redundant .snapshot().
- Delete now-dead types: CorpusEntryType, FailureInfo, CorpusClient/
  CorpusRegistryProtocol (the registry was only the engine's empty-Corpus
  factory). CorpusEntry is now just { input, scheduleBytes }.
- Keep the mutable Corpus class as a test-only value builder (the engine no
  longer touches it); regenerate the Xcode project for the deleted files.

The scheduler is per-engine and torn down after the run, so the result owns a
materialized snapshot — there is no live-reference view into a freed scheduler.
545 PropertyTestingKitTests green, 0 failures (ShrinkingPluginTests now expects
2 plugin actions).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_014mrEZMehSXEHXv6vvGvzsP
There is no mutable corpus type any more. The engine already materialized its
result as a CorpusSnapshot at run-end (previous commit); the Corpus class was
only a test-time value builder. Remove it and the tests that exercised its
stateful behavior.

- Delete the Corpus class from Corpus.swift; keep CorpusSnapshot (now also
  exposes `inputs`).
- Remove tests of Corpus statefulness: CorpusTests.swift, and
  testCorpusIsEmpty / testCorpusInputs / testCorpusComplexTypes in
  PropertyBasedSelfTests (the cross-engine input-identity merge test stays).
- Convert strategy tests that used a Corpus as a counting sink to assert on the
  evaluator's own signal (boolean / snapshot count); drop ~9 dead `let corpus`
  leftovers; build CorpusSnapshot directly where a snapshot was needed.
- Anchor realisticCoverageGapTest's expected gap line to the function via #line
  (funcAnchor + 12) so edits elsewhere in the file no longer shift it — removing
  a line above it had moved the SUT and broken the hardcoded expectedLine.

Regenerate the Xcode project for the deleted file. 541 PropertyTestingKitTests
green, 0 failures.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_014mrEZMehSXEHXv6vvGvzsP
signal-evaluator-ledger is the successor architecture: signal-agnostic
scheduler (AnyScheduler/SchedulerFactory + makeProviders), the
Signal/Evaluator/Ledger decomposition (OwnershipLedger + edge/boundary
evaluators), and corpus-as-view (no mutable Corpus, no sparseCoverage,
no submitToCorpus). It deletes the bolted-on cmp path
(BoundaryDistanceLedger) and the coverage-keyed corpus that
trace-cmp-substrate still carried.

Per decision, SEL supersedes trace-cmp-substrate: this merge takes SEL's
tree wholesale and makes trace-cmp-substrate the canonical branch going
forward. Both parents are recorded.

Superseded (NOT carried into the tree) — trace-cmp-substrate perf/substrate
work that optimized the old design and must be re-applied to the new
architecture if still wanted:
- hot-path perf: lock excision (Finding 42), no-SipHash feature sets,
  bitmap edge union, per-iteration alloc kills, cmp-recorder gating,
  cmp drop filter, dispatch suppression during gen/mutation
- distance-only boundary accumulator; runtime->compile-time filters
- featureOwnership-by-default tuning

Tree is identical to signal-evaluator-ledger @ 1c4f840 (541 tests green).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_014mrEZMehSXEHXv6vvGvzsP
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant