Skip to content

Incremental sink reads: per-row ingest-seq watermark for forward + blob sinks (LLP 0039/0040)#159

Draft
philcunliffe wants to merge 16 commits into
masterfrom
integration/incremental-sink-reads
Draft

Incremental sink reads: per-row ingest-seq watermark for forward + blob sinks (LLP 0039/0040)#159
philcunliffe wants to merge 16 commits into
masterfrom
integration/incremental-sink-reads

Conversation

@philcunliffe

@philcunliffe philcunliffe commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Implements the incremental sink reads change set — sinks read only rows added since their last successful export instead of re-reading the whole partition every tick.

  • Request: LLP 0039 (escalated from central forward sink has no cursor — re-reads & re-sends the whole dataset every tick #122) · Design: LLP 0040 · Plan: LLP 0042
  • T1 stamp a monotonic _hyp_ingest_seq at the decorateRow write chokepoint (crash-safe allocator)
  • T2 extend storage.readRows with cursor-aware since/continuation + null-seq migration
  • T3 persist a per-(sink instance, partition) watermark keyed by logical partition path
  • T4 wire the central forward sink to incremental read
  • T5 wire the core blob sink (local-fs + s3)
  • T6 exactly-once tests across retention prune + compaction generation swap

Each task landed as its own verified --no-ff merge with green CI. Server idempotency ledger retained as the in-flight retry net.

Change-Set: incremental-sink-reads

philcunliffe and others added 15 commits June 25, 2026 14:20
Cover LLP 0039 with a neutral-minted design for a per-(sink, partition)
watermark so the central forward sink and the core blob sink read and ship
only rows added since their last successful export.

Recommends a monotonic per-row _hyp_ingest_seq column over snapshot ancestry
(does not survive a compaction generation swap) and a content-addressed
seen-set (cannot meet the bounded-read goal). Specifies the readRows since/
continuation extension, the persisted watermark contract keyed by the
generation-stable logical partition path, application to both sinks, and the
exactly-once argument across retention prunes and compaction swaps.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Refine LLP 0040 into six small, independently-mergeable tasks along the
producer -> read-API -> persistence -> consumer seam:

  T1 stamp _hyp_ingest_seq at the decorateRow chokepoint (deps: [])
  T2 readRows since/continuation + readRowsSince (deps: T1)
  T3 per-(sink,partition) watermark store keyed by logical path (deps: T2)
  T4 wire the central forward sink (deps: T2,T3)
  T5 wire the core blob sink (deps: T2,T3)
  T6 exactly-once tests across retention prune + compaction swap (deps: T4,T5)

Verified with `neutral ready incremental-sink-reads --json`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… chokepoint

Adds the row-resident, append-monotonic int64 watermark column that the
incremental-sink-reads design (LLP 0040, Candidate B) is built on. This is
the producer half of the seam: nothing reads the column yet, and it is
stripped by INTERNAL_FIELDS from every existing readRows consumer, so it
merges with zero behavioural change.

- New `createIngestSeqAllocator` (src/core/cache/ingest-seq.js): a crash-safe,
  never-regressing monotonic int64 allocator. Reserve-before-stamp — a block
  of seqs is durably persisted (nextSeq advanced via atomic write-rename)
  before any seq in it is handed to a row, so a resumed flush never re-issues
  a seq <= one already stamped/exported. Gaps are tolerated; regressions are
  not. The counter is cache-global (<cacheRoot>/_hyp_ingest_seq.json), not a
  per-partition cursor.json, because decorateRow runs before rows are grouped
  into source= partitions and two spool paths (live + backfill) can feed one
  partition — only a cache-wide counter keeps every partition's seq subsequence
  strictly increasing. (LLP 0040 §7 records this refinement of risk #2.)
- streaming-reader.js: decorateRow stamps `_hyp_ingest_seq` (the cache_row_id
  hash is still computed over the original row, so seq does not perturb dedup);
  the chunk's columns gain the additive nullable INT64 column so it lands in
  the Iceberg schema and rides a compaction generation swap verbatim; the field
  joins INTERNAL_FIELDS.
- spool.js wires one cache-global allocator into the flush loop.
- Tests: allocator monotonicity / never-regress-across-restart / reserve-
  before-stamp / concurrency; streamFlushFile stamping; and a storage round-trip
  proving the seq persists in Iceberg, increases per row, and is stripped from
  readRows. Verified separately that the column survives a real compaction swap.

Task-Id: T1

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a back-compatible `opts.since` to `readRows` and a cursor-aware
`readRowsSince` sibling that pairs each internal-stripped row with its
`after` continuation token, so the forward and blob sinks can read only
rows added since their last durable export.

- `scanRowsFromTable` gains an `opts.since` (bigint `_hyp_ingest_seq`
  watermark) and applies a `seq > since` predicate as a yielded-row
  filter. It is NOT pushed into icebird's `scan({ where })`: icebird
  couples file/row-group pruning with a per-row match that drops nulls
  (`null > since` is false), which would silently skip the legacy
  null-seq rows the migration must preserve (LLP 0040 risk #1). The
  design names this yielded-row filter as the fallback; a future
  null-aware icebird filter can add the file-skip optimization on top.
- null-seq = new: a row whose `_hyp_ingest_seq` is null/absent
  (pre-upgrade) is always yielded, so the one-time migration is at worst
  a full re-export, never silent data loss. A table that never carried
  the seq column yields everything.
- `after` is a monotonic high-water of real seqs, so a null-seq row
  carries the prior watermark forward and progress never regresses even
  when the scan visits seqs out of order (interleaved sources).
- `opts` absent ⇒ byte-for-byte the pre-existing full scan, so every
  current caller is untouched until it opts in.
- Update the kernel-types decl: `SinkContinuation`, `ReadRowsOptions`,
  the extended `readRows`, and the new `readRowsSince`.

Tests cover back-compat, after-token monotonicity, no-new-rows ≈0,
incremental new rows, the null-seq migration contract, the pure-legacy
(no seq column) table, and invalid-token rejection.

Task-Id: T2

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add `src/core/sinks/watermarks.js`, the shared incremental-read watermark
store for the forward and blob sinks (LLP 0042 task T3). Files live at
`<stateDir>/watermarks/<dataset>/<partition-key>.json` and carry the
versioned `{ continuation, exportedRowCount, updatedAt }` record.

The key is derived from the partition's stable LOGICAL path
(`datasets/<dataset>/<partition...>` relative to cacheRoot), never the
physical `tableDir` inside it — the hinge of design constraint (B): the
watermark reads straight through a compaction generation swap and a
retention front-prune. `write` is atomic write-rename (the
`writeCursor`/`writeProgress`/`ingest-seq.js` idiom); a corrupt or absent
record reads back as null so a sink re-exports from the start
(at-least-once + downstream dedup) rather than silently skipping rows.

Adds `SinkWatermarkKey`/`SinkWatermarkRecord`/`SinkWatermarkStore` types,
anchors LLP 0040 §3 (`#watermark-contract`), and a unit suite covering key
derivation (logical-not-tableDir, nesting, sanitize, sentinel, escape
guard), round-trip, in-place advance, atomic-no-temp, malformed-token
rejection, and corrupt-file null.

@ref LLP 0040#watermark-contract

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Task-Id: T3
Switch `forwardPartition` from a full-partition `storage.readRows(tablePath)`
scan to the cursor-aware `storage.readRowsSince(tablePath, { since })`, driven
by a per-(sink instance, partition) watermark loaded from the sink's stateDir
watermark store (T3). Each acked chunk advances the watermark to that chunk's
last `after` token (ship first, advance second), so a crash re-sends at most one
chunk and the server idempotency ledger now backstops only a bounded in-flight
suffix instead of the whole partition. A tick with no new rows yields zero rows,
sends zero chunks, and writes zero bytes.

Chunking (MAX_CHUNK_ROWS/MAX_CHUNK_BYTES), the Retry-After backpressure loop,
and `batchIdForChunk` derivation are unchanged. A missing/unreadable watermark
or underivable key falls back to a full scan (at-least-once + server dedup),
never a silent skip.

Implements task T4 of incremental-sink-reads (LLP 0040 §3-4, 0042).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Task-Id: T4
Wire the local-fs and s3 blob destinations to the cursor-aware
readRowsSince surface (T2) and the persisted per-(sink instance,
partition) watermark store (T3), so each tick exports only rows added
since the sink's last durable PUT.

- New shared helpers in src/core/sinks/incremental.js (exported via
  hypaware/core/sinks): openIncrementalRows (peek-to-decide-empty,
  self-tracking rowCount + high-water lastAfter, feeds the unchanged
  encoder.encodePartition contract), withSeqRangeFilename (embeds
  [sinceSeq,lastSeq] before the extension), watermarkKeyFor, and
  createInstanceWatermarkStore.
- Empty new-row set writes no blob (skip, 0 bytes).
- The output filename/object key embeds the [sinceSeq,lastSeq] range so a
  crash-retry re-PUTs the same key (idempotent overwrite) — the blob
  sink's stand-in for the central server ledger.
- The watermark advances only after the durable write/PUT.
- PluginPaths.stateDir is per-plugin, not per sink instance, so the
  watermark store is scoped under the instance to honor the per-(sink
  instance, partition) contract.
- Tests: helper unit tests, a local-fs end-to-end incremental test
  (ranged filename, watermark advance, skip-empty, new range, cumulative
  count), and rewritten s3-export-batch tests (skip-empty, ranged key,
  watermark advance, idempotent re-PUT on lost watermark) preserving the
  prior retry-semantics coverage.
- Updated the two local-fs blob-sink smokes to match the ranged filename
  (note: both are pre-existing red on the integration branch for an
  unrelated reason — the driver hands the sink the drained spool path).

Task-Id: T5
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add the LLP 0040 exactly-once proof the T4/T5 unit suites can't give
(they stub storage): a deterministic acceptance test driving the REAL
kernel cache, retention enforcer, maintainCache compaction, BOTH sinks
(central forward + core local-fs blob), and the driver outbox respool.

Covers: ~0 bytes on a no-new-rows tick, ~N on an N-new tick, exactly-once
across a retention front-prune and a compaction generation swap for both
sinks, and watermark vs. driver-outbox respool composition (suffix-only
replay + idempotent batch-id / re-PUT). Adds a hermetic smoke proving the
blob sink reads straight through a compaction generation swap via the real
driver. Anchors LLP 0040 section 5 so the @ref resolves.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Task-Id: T6
…ance

The forward sink derived its X-Hyp-Batch-Id from the per-tick chunk ordinal
(chunkIndex). After an earlier chunk was acked and the (sink, partition)
watermark advanced, a respool re-read only the un-acked suffix and re-numbered
it from 0 — minting a NEW batch-id for a chunk that may already have committed
on the server (ambiguous ack / commit-then-5xx). The server idempotency ledger
could not dedupe the redelivery, double-storing rows and breaking the spec's
'ledger covers mid-batch retries' guarantee.

Key the batch-id on the chunk's start seq (the watermark it resumes from)
instead. A respooled suffix reproduces the same [startSeq, body] and thus the
same id, so the ledger dedupes it; distinct chunks still differ because each
row's _hyp_ingest_seq is unique. Adds a cross-tick regression test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@philcunliffe

Copy link
Copy Markdown
Contributor Author

🤖 neutral dual-review — verdict: block

Blast radius / risk: HIGH — kernel storage read API (QueryStorageService.readRows/readRowsSince) + the cache write chokepoint (decorateRow seq stamp) + both production sinks. A skip in the seq/watermark contract silently drops rows forever; a dup breaks the spec's exactly-once guarantee. Reviewed at 35d77bd (pre-fix head).

Both reviewers ran. Both verdicts: block.

The exactly-once skeleton is sound where it's been proven: the T6 acceptance suite is non-vacuous — it drives the real cache, real retention enforcer, real maintainCache compaction, both real sinks, and the real driver outbox, and genuinely asserts no skip/no dup across a retention front-prune and a compaction generation swap. I empirically confirmed the load-bearing claims hold: retention deletes only sub-watermark rows; the seq rides compaction verbatim; the logical-partition watermark key reads straight through the tableDir swap; the allocator is crash-safe and never-regressing (reserve-before-stamp). Exactly-once across retention+compaction: HELD. The blockers below are in the retry/redelivery and migration edges the T6 suite does not exercise.

Codex key points

Codex (gpt-5.5, xhigh) returned 3 blockers + 3 majors + 1 nit. Reconciled:

  • Batch-id keyed on per-tick chunk index (blocker) — confirmed & FIXED (see below).
  • Null-seq legacy rows are "new forever" (blocker) — confirmed. Not a one-time migration.
  • Cross-process duplicate seqs → skip (blocker) — confirmed, with caveat: design-documented out-of-scope; concurrent same-partition flush already crashes on a cursor.json race, bounding it.
  • Spool progress advances after append (major) — pre-existing spool behaviour, not introduced here; compaction dedupes by _hyp_cache_row_id. Downgraded to minor.
  • appendRowsToPartition bypasses seq stamping (major) — confirmed; same root cause as the null-seq blocker (permanent null-seq rows).
  • Retention uncoupled from watermark (major) — explicitly documented out-of-scope in LLP 0040 §6 risk [codex] clarify first-run picker and npx daemon bin #5; not a defect of this PR. Note only.
  • Inline import() types in test files (nit) — valid style nit; left as-is.

My key points (independent Claude pass)

  • I independently reproduced the batch-id instability and the null-seq re-export against the real kernel cache.
  • Null-seq re-export is a true exactly-once duplication, not just waste: scanRowsFromTable unconditionally yields null-seq rows for any since, and they never advance the monotonic after. So after the first export advances the watermark to a real seq, every later tick re-yields the legacy rows — the blob sink writes them into a second object (legacy rows appear in both the 0-maxSeq blob and the later maxSeq-maxSeq blob → duplicated), and the forward sink re-sends them under a different body+batch-id (server stores them again). Affects every deployment with a pre-upgrade cache until retention prunes the legacy rows (≥30 days), and permanently for any dataset written via appendRowsToPartition. Violates spec acceptance "≈0 bytes on a no-new-rows tick" and "no row duplicated".
  • Latent fragility (minor, not currently reachable): the forward sink's per-chunk watermark advance to the running-max after is only skip-safe because the physical read order happens to be seq-monotonic in every reachable path (sequential flush, post-compaction — both I verified empirically; the one out-of-order trigger, concurrent same-partition flush, crashes first). It is not a guaranteed icebird contract — icebergDataSource even documents "record-count order" for file iteration. If file read order ever ceases to be seq-ascending, a mid-tick respool would skip the lower-seq suffix. The blob sink is robust (single end-of-scan advance); the forward sink is not. Worth a guard or an explicit ordered-read contract.

Fix pushed (round 2 will re-review)

Batch-id stability (hypaware-core/plugins-workspace/central/src/sink.js): batchIdForChunk now keys on the chunk's start seq (the watermark it resumes from) instead of the per-tick chunkIndex. A respooled suffix reproduces the same [startSeq, body] → same id → server ledger dedupes the redelivery; distinct chunks still differ (each row's _hyp_ingest_seq is unique). Added a cross-tick regression test (a respooled suffix re-sends the un-acked chunk with the SAME batch-id). Full suite green (1448 tests).

The two remaining blockers (null-seq migration; cross-process seq safety) are design-level — LLP 0040 §6 risk #1 itself flags the null-seq default as "must be nailed down (the wrong default … silently skips null-seq rows (data loss))". Resolving them is an author decision (stamp legacy rows during compaction / a one-time backfill migration, or track a legacyExported flag in the watermark; assert/guard single-writer for the allocator), so I did not guess at them in review — a wrong choice here risks silent data loss.

Findings

  • Severity: blocker — Forward sink batch-id keyed on per-tick chunkIndex: after a watermark advance, a respooled (committed-but-unacked) chunk gets a new id and the server ledger can't dedupe it → duplicate rows. Violates spec "ledger covers mid-batch retries". FIXED in 1141f3d.
  • Severity: blocker — Null-seq legacy rows yielded every tick and never captured by the watermark → blob sink duplicates them across objects; forward sink re-sends them (server re-stores). Not a one-time migration; violates exactly-once + "≈0 bytes on no-new". src/core/cache/iceberg/store.js:330, src/core/cache/storage.js:227. Escalated (design decision).
  • Severity: blocker/major — appendRowsToPartition writes committed rows without stamping _hyp_ingest_seq → permanent null-seq rows; same family as above; such datasets never reach a no-new-rows steady state. src/core/cache/partition.js. Escalated.
  • Severity: major (caveat) — Allocator mutex is in-process only; two processes over one cache root (e.g. daemon + hyp backfill) can reserve the same block → duplicate seqs → a same-partition duplicate split across an export boundary is skipped forever. Documented out-of-scope in LLP 0040 §7; concurrent same-partition flush crashes on a cursor race, bounding reachability. Escalated.
  • Severity: minor — Forward sink per-chunk watermark advance is skip-safe only under seq-monotonic read order, which icebird does not contract (vs. the blob sink's robust single advance). Not currently reachable; latent. hypaware-core/plugins-workspace/central/src/sink.js.
  • Severity: minor — Spool advances writeProgress after appendChunk; a crash in between re-appends rows with new seqs (dup content until compaction dedup). Pre-existing spool behaviour, not introduced here. src/core/cache/spool.js:136.
  • Severity: note — Retention is intentionally uncoupled from the watermark (LLP 0040 §6 risk [codex] clarify first-run picker and npx daemon bin #5): a lagging sink's un-exported front rows can still be pruned. Documented, out-of-scope.
  • Severity: nit — Inline import('...') types in new test/smoke files violate the repo JSDoc rule. test/core/sink-watermarks.test.js:21, test/core/sink-incremental-acceptance.test.js:214, hypaware-core/smoke/flows/incremental_sink_compaction.js:209.

@philcunliffe philcunliffe added the neutral:stuck neutral attempted this but cannot complete it autonomously — needs a human label Jun 26, 2026
@philcunliffe

Copy link
Copy Markdown
Contributor Author

🤖 neutral: held for a design decision (review verdict = block)

The dual-review found real exactly-once defects. One is fixed; three are design-level and need a human decision — neutral will not guess at choices that risk silent data loss, so this PR is neutral:stuck (won't auto-advance).

Fixed in this PR (1141f3d, regression test added, suite green):

  • Forward-sink batch-id was keyed on the per-tick chunk ordinal, so a respooled committed-but-unacked chunk got a new id and the server ledger couldn't dedupe it → duplicate rows. Now keyed stably.

Needs a design decision (escalated, not fixed):

  1. null-seq migration is not one-time. Legacy (pre-upgrade) rows have no _hyp_ingest_seq, are yielded every tick, and never advance the watermark → the blob sink duplicates them across objects and the forward sink re-sends them every tick. The design's own risk register flagged the null-seq default as "must be nailed down — wrong default risks silent data loss."
  2. appendRowsToPartition doesn't stamp _hyp_ingest_seq → it writes permanent null-seq rows (same failure family as [codex] Add durable cache spool #1).
  3. Allocator mutex is in-process only → the daemon and a concurrent hyp backfill can hand out the same seq block → duplicate/skew.

What holds: the core watermark design is sound — exactly-once across a retention front-prune and a compaction generation swap was verified, and the T6 acceptance suite is non-vacuous (drives real cache/retention/compaction/both sinks).

Recommended path: revise the design (LLP 0040) to nail down (a) the null-seq backfill/stamping strategy for legacy + appendRowsToPartition rows, and (b) cross-process allocator safety; then re-implement the affected tasks. Remove neutral:stuck once the design direction is set and neutral will resume.

@philcunliffe

Copy link
Copy Markdown
Contributor Author

Dual-agent review — block

  • Verdict: block
  • Risk class: high (cache/sinks data-integrity; the exactly-once guarantee is the whole point of this PR)
  • Reviewers: Codex (gpt-5.5) + 5 Claude reviewers, run in an isolated git worktree at the PR head.
  • Advisory only: no merge was attempted.

The headline guarantee — exactly-once incremental reads — does not hold under the failure modes the design itself anticipates. One blocker (3 reviewers agree) plus four exactly-once-violating majors.

🛑 Blocker — watermark advances by max-seq over an unordered scan → silent permanent row skip

readRowsSince yields rows in physical scan order while emitting after = the running max seq seen (src/core/cache/storage.js:227-234); the forward sink advances its durable watermark per acked chunk to that max (central/src/sink.js:288-292). When physical order ≠ seq order — which the design itself anticipates (LLP 0040 risk #3 / §7: interleaved live+backfill spool; and post-compaction sortOrder re-sort, maintenance.js:517-518) — an early acked chunk advances the watermark past lower-seq rows sitting in a later, un-acked chunk. A crash or single chunk failure (poison 4xx / backpressure) between chunks then leaves since above those rows and the next tick filters them out (seq <= since) forever — silent permanent data loss, the opposite of the exactly-once claim. Every test appends rows in strictly increasing seq order, so this exact scenario is unproven. (Codex + 2 Claude reviewers.)
Fix: either scan ordered by _hyp_ingest_seq (so a shipped prefix truly means "all seqs ≤ last"), or advance the forward-sink watermark only once at end-of-partition (as the blob sink does). Add a forward-sink test with seqs out of physical order across a chunk boundary, fail the later chunk, re-tick, and assert the lower-seq rows are still delivered.

Major — legacy null-seq rows re-export every tick (never reach a no-new-rows steady state)

Null-seq rows are always yielded and never advance the watermark (iceberg/store.js:328-331, storage.js:229-230), so a cache upgraded from before this PR (its entire corpus is null-seq) re-exports its whole legacy backlog on every tick — violating the design's "one-time full re-export" (§6 risk #1 / plan T2) and the "≈0 bytes on a no-new-rows tick" acceptance. For a mixed/central partition this is duplication, not just cost (re-POST under a new batch-id after compaction reorders the body; blob S.0-10 then S.10-10). (Codex + 3 Claude reviewers.)
Fix: on first post-upgrade export of a partition, stamp the watermark to its max real seq (or stamp seqs onto legacy rows at next compaction) so re-export is genuinely one-time. Add 2nd-tick tests for pure-legacy and mixed partitions.

Major — mid-retry exactly-once breaks when new rows arrive before the retry (both sinks)

"Ship/PUT first, advance watermark second" is only exactly-once if the in-flight unit is fixed. If a chunk/blob commits server-side, the watermark write is lost, and new rows append before retry, the resumed export reads since=oldWatermark and the suffix grows: the forward chunk body changes → new batch-id → server stores it again; the blob lastSeq changes → proxy.0-50.json and proxy.0-60.json both land, duplicating rows 0–50. _hyp_cache_row_id is stripped from the wire so the server can only dedup by batch-id → at-least-once, not exactly-once. T6 tests only cover the no-new-rows-before-retry case. (2 Claude reviewers.)
Fix: make the in-flight unit boundary independent of later arrivals (cap the resumed chunk/blob at the in-flight seq range, or persist a pre-commit intent record so the retry reproduces the identical body/key), or correct §5 to claim at-least-once.

Major — central forward sink scopes watermarks per-plugin, not per-instance

central/index.js:69 uses createSinkWatermarkStore({ stateDir: sinkCtx.paths.stateDir }) (per-plugin) while local-fs/s3 use createInstanceWatermarkStore({ paths, instanceName: sinkCtx.name }). Two @hypaware/central instances (dual-homing) share one <stateDir>/watermarks/<dataset>/<partition>.json and clobber each other → silent skip/data loss. The @ref comment at central/index.js:68 ("scoped to the sink instance via PluginPaths.stateDir") is factually wrong — PluginPaths.stateDir is per-plugin. (Codex + 1 Claude reviewer.)
Fix: use the public createInstanceWatermarkStore({ paths: sinkCtx.paths, instanceName: sinkCtx.name }) for central too; drop the deep import; correct the @ref.

Major — bounded-reads constraint (C) is not delivered

Design §1 makes "a tick reads ≈N rows for N new rows, independent of total partition size" the deciding constraint that picks Candidate B, and §5 claims file-level max(seq) ≤ watermark pruning. As built there's no pushdown/file-skip (iceberg/store.js:309-331 full-scans the snapshot and applies seq <= since as a yielded-row filter), so reads are O(surviving rows) per tick and the stated advantage isn't realized or tested. (1 Claude reviewer.)
Fix: land the null-aware seq > since file/row-group skip, or correct §1(C)/§5/acceptance to state reads are bounded by surviving-partition size pending the icebird pushdown.

Minor — batchIdForChunk @ref gloss contradicts design §4

Code correctly keys the batch-id by chunkStartSeq (the "keep chunk batch-id stable across watermark advance" fix), but LLP 0040 §4 still describes the old chunkIndex model, so the cited section asserts the opposite of the code and §5's retry argument only holds under the code's keying (central/src/sink.js:251). Update §4 (keep refs honest / living docs).

Nit — inline import('...') types in new test/smoke files

sink-watermarks.test.js:21, sink-incremental-acceptance.test.js:214, incremental_sink_compaction.js:209 use inline import('...') types against the house rule (the PR's production files follow it). Low priority — loosely enforced elsewhere.


🤖 neutral-reconcile dual review (worktree-isolated). Held for a human — neutral never merges.

… sinks (#159 review)

Dual-review found a BLOCKER plus majors against the exactly-once claim. Fixes,
each with a regression test that fails before and passes after:

[BLOCKER] Forward sink advanced the watermark per acked chunk to the scan's
running-max `after`. Because the cache scan is NOT seq-ordered (interleaved
live+backfill spool; post-compaction sortOrder re-sort), an early acked chunk
could checkpoint past lower-seq rows still un-acked in a later chunk; a
between-chunk failure then dropped them forever (`seq <= since`). Fix: advance
the forward watermark ONCE at end-of-partition (as the blob sink already does),
so a partial partition never checkpoints — a failure re-reads the whole
partition and the server ledger dedupes the acked prefix (stable
chunkStartSeq batch ids).
  test: central-forward-chunking "an unordered scan never skips a lower-seq row
  when a later chunk fails".

[MAJOR] Legacy null-seq rows re-exported every tick (never reached steady
state). Add `includeLegacy` to the storage read API (default true). Both sinks
pass `includeLegacy = (no durable watermark)`: a fresh sink exports the
pre-upgrade backlog once, then excludes null-seq rows. Safe because no new
null-seq row can appear post-upgrade (decorateRow stamps every flushed row).
  tests: sink-incremental-acceptance pure-legacy + mixed, forward + blob
  (~0 bytes on 2nd tick, no row in two artifacts).

[MAJOR] Central forward sink scoped watermarks per-PLUGIN, not per-instance —
two @hypaware/central instances clobbered one watermark file. Switch to
createInstanceWatermarkStore({ paths, instanceName }) (matching local-fs/s3);
correct the @ref.
  test: sink-incremental-acceptance "two instances keep independent
  watermarks (no cross-instance skip)".

[MAJOR, doc] Bounded-reads constraint (C): the read is a full scan over the
surviving partition with a yielded-row filter, not the file/row-group skip the
design implied. Correct LLP 0040 §1(C)/§5 to state reads are bounded by
surviving-partition size, with O(N_new) reads pending null-aware icebird
pushdown.

[MINOR/NIT, doc+style] Correct LLP 0040 §4 (batchIdForChunk keys by
chunkStartSeq, not chunkIndex). Hoist inline import('...') types to @import in
sink-watermarks/acceptance tests and the compaction smoke.

[MAJOR, ESCALATED — left unfixed] Mid-retry duplication when a unit commits,
its watermark write is lost, AND new rows append before the retry: the resumed
in-flight unit grows past what committed and the dedup net no longer recognizes
it. The common in-flight retry (no new arrivals) IS covered. A correct fix needs
a pre-commit intent (read `until` upper bound + persisted intent record) across
both sinks — a design-level change to the watermark contract, deferred rather
than half-patched under the exactly-once claim. Documented as LLP 0040 §6 risk
#7 / §5 known gap.

LLP 0040/0042 updated in this commit to match the new behavior.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@philcunliffe

Copy link
Copy Markdown
Contributor Author

neutral:stuck — one design decision remains

The block verdict was driven by the watermark-ordering blocker, which is now fixed and CI-green (commit 7819992): the forward sink advances the watermark only at end-of-partition, with a regression test proving an unordered scan no longer skips a lower-seq row when a later chunk fails. Also fixed this round and verified green (npm test 1453 pass, typecheck/lint clean): legacy null-seq one-time export (includeLegacy), central per-instance watermark scoping, and the bounded-reads (C) + §4 @ref doc corrections. LLP 0040/0042 were updated in the same commit.

Remaining — needs a human design decision (not auto-fixable)

Mid-retry duplication when new rows arrive before the retry (review finding #3). If a chunk/blob commits server-side, the watermark write is lost, and new rows append before the retry, the resumed in-flight unit grows → the forward batch-id changes (server re-stores) / the blob lastSeq changes (overlapping duplicate blob). The fix worker declined a half-fix and documented it as a known gap (LLP 0040 §6 risk #7 / §5), scoping the exactly-once claim to exclude this narrow window. The common retry (no new arrivals) stays covered by the server ledger / seq-range filename.

Choose one:

  • (a) Ratify the documented at-least-once edge for this narrow window as an accepted limitation (LLP 0040 §5/§6 already corrected to say so), or
  • (b) Implement a pre-commit intent mechanism — a read until upper bound + a persisted intent record so a retry caps its read at the committed lastSeq and reproduces the identical body/key, restoring full exactly-once. This is a watermark-contract change affecting both sinks.

Either way the resolution lands as additive commits on this same branch (mint a small decision LLP + update LLP 0040). Full review: the dual-review comment above on this PR.

Held as a draft — neutral will not re-review or churn this while neutral:stuck.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

neutral:stuck neutral attempted this but cannot complete it autonomously — needs a human

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant