Skip to content

T1: stamp internal monotonic _hyp_ingest_seq at the decorateRow flush chokepoint#153

Merged
philcunliffe merged 1 commit into
integration/incremental-sink-readsfrom
task/incremental-sink-reads/T1
Jun 25, 2026
Merged

T1: stamp internal monotonic _hyp_ingest_seq at the decorateRow flush chokepoint#153
philcunliffe merged 1 commit into
integration/incremental-sink-readsfrom
task/incremental-sink-reads/T1

Conversation

@philcunliffe

Copy link
Copy Markdown
Contributor

Implements task T1 of integration/incremental-sink-reads (LLP 0042 plan).

Adds the row-resident, append-monotonic int64 _hyp_ingest_seq watermark column (LLP 0040 Candidate B) — the producer half of the seam. Nothing reads it yet; INTERNAL_FIELDS strips it from every existing readRows consumer, so this merges with zero behavioural change.

  • createIngestSeqAllocator (src/core/cache/ingest-seq.js): crash-safe, never-regressing monotonic int64 allocator. Reserve-before-stamp — a block of seqs is durably persisted (atomic write-rename) before any is stamped, so a resumed flush never re-issues a seq <= one already stamped. Cache-global (<cacheRoot>/_hyp_ingest_seq.json) rather than per-partition cursor.json, because decorateRow runs before rows are grouped into source= partitions and two spool paths (live + backfill) can feed one partition (LLP 0040 §7 records this refinement of risk [codex] Remove OpenTelemetry npm dependencies #2).
  • decorateRow stamps the seq (hash still over the original row); the chunk columns gain the additive nullable INT64 column so it lands in the Iceberg schema and rides a compaction generation swap verbatim; the field joins INTERNAL_FIELDS.
  • Tests: allocator monotonicity / never-regress-across-restart / reserve-before-stamp / concurrency; streamFlushFile stamping; storage round-trip (seq persists, increases per row, stripped from readRows).

Checks: npm test (1399 pass / 0 fail), npm run typecheck, npm run lint all green; cache_spool_batching + cache_lifecycle_maintenance smokes pass; compaction-swap preservation verified.

Task-Id: T1

🤖 Generated with Claude Code

… chokepoint

Adds the row-resident, append-monotonic int64 watermark column that the
incremental-sink-reads design (LLP 0040, Candidate B) is built on. This is
the producer half of the seam: nothing reads the column yet, and it is
stripped by INTERNAL_FIELDS from every existing readRows consumer, so it
merges with zero behavioural change.

- New `createIngestSeqAllocator` (src/core/cache/ingest-seq.js): a crash-safe,
  never-regressing monotonic int64 allocator. Reserve-before-stamp — a block
  of seqs is durably persisted (nextSeq advanced via atomic write-rename)
  before any seq in it is handed to a row, so a resumed flush never re-issues
  a seq <= one already stamped/exported. Gaps are tolerated; regressions are
  not. The counter is cache-global (<cacheRoot>/_hyp_ingest_seq.json), not a
  per-partition cursor.json, because decorateRow runs before rows are grouped
  into source= partitions and two spool paths (live + backfill) can feed one
  partition — only a cache-wide counter keeps every partition's seq subsequence
  strictly increasing. (LLP 0040 §7 records this refinement of risk #2.)
- streaming-reader.js: decorateRow stamps `_hyp_ingest_seq` (the cache_row_id
  hash is still computed over the original row, so seq does not perturb dedup);
  the chunk's columns gain the additive nullable INT64 column so it lands in
  the Iceberg schema and rides a compaction generation swap verbatim; the field
  joins INTERNAL_FIELDS.
- spool.js wires one cache-global allocator into the flush loop.
- Tests: allocator monotonicity / never-regress-across-restart / reserve-
  before-stamp / concurrency; streamFlushFile stamping; and a storage round-trip
  proving the seq persists in Iceberg, increases per row, and is stripped from
  readRows. Verified separately that the column survives a real compaction swap.

Task-Id: T1

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@philcunliffe philcunliffe merged commit ca8c4ee into integration/incremental-sink-reads Jun 25, 2026
6 checks passed
@philcunliffe philcunliffe deleted the task/incremental-sink-reads/T1 branch June 25, 2026 23:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant