T1: stamp internal monotonic _hyp_ingest_seq at the decorateRow flush chokepoint#153
Merged
philcunliffe merged 1 commit intoJun 25, 2026
Conversation
… chokepoint Adds the row-resident, append-monotonic int64 watermark column that the incremental-sink-reads design (LLP 0040, Candidate B) is built on. This is the producer half of the seam: nothing reads the column yet, and it is stripped by INTERNAL_FIELDS from every existing readRows consumer, so it merges with zero behavioural change. - New `createIngestSeqAllocator` (src/core/cache/ingest-seq.js): a crash-safe, never-regressing monotonic int64 allocator. Reserve-before-stamp — a block of seqs is durably persisted (nextSeq advanced via atomic write-rename) before any seq in it is handed to a row, so a resumed flush never re-issues a seq <= one already stamped/exported. Gaps are tolerated; regressions are not. The counter is cache-global (<cacheRoot>/_hyp_ingest_seq.json), not a per-partition cursor.json, because decorateRow runs before rows are grouped into source= partitions and two spool paths (live + backfill) can feed one partition — only a cache-wide counter keeps every partition's seq subsequence strictly increasing. (LLP 0040 §7 records this refinement of risk #2.) - streaming-reader.js: decorateRow stamps `_hyp_ingest_seq` (the cache_row_id hash is still computed over the original row, so seq does not perturb dedup); the chunk's columns gain the additive nullable INT64 column so it lands in the Iceberg schema and rides a compaction generation swap verbatim; the field joins INTERNAL_FIELDS. - spool.js wires one cache-global allocator into the flush loop. - Tests: allocator monotonicity / never-regress-across-restart / reserve- before-stamp / concurrency; streamFlushFile stamping; and a storage round-trip proving the seq persists in Iceberg, increases per row, and is stripped from readRows. Verified separately that the column survives a real compaction swap. Task-Id: T1 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ca8c4ee
into
integration/incremental-sink-reads
6 checks passed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implements task T1 of integration/incremental-sink-reads (LLP 0042 plan).
Adds the row-resident, append-monotonic int64
_hyp_ingest_seqwatermark column (LLP 0040 Candidate B) — the producer half of the seam. Nothing reads it yet;INTERNAL_FIELDSstrips it from every existingreadRowsconsumer, so this merges with zero behavioural change.createIngestSeqAllocator(src/core/cache/ingest-seq.js): crash-safe, never-regressing monotonic int64 allocator. Reserve-before-stamp — a block of seqs is durably persisted (atomic write-rename) before any is stamped, so a resumed flush never re-issues a seq<=one already stamped. Cache-global (<cacheRoot>/_hyp_ingest_seq.json) rather than per-partitioncursor.json, becausedecorateRowruns before rows are grouped intosource=partitions and two spool paths (live + backfill) can feed one partition (LLP 0040 §7 records this refinement of risk [codex] Remove OpenTelemetry npm dependencies #2).decorateRowstamps the seq (hash still over the original row); the chunk columns gain the additive nullable INT64 column so it lands in the Iceberg schema and rides a compaction generation swap verbatim; the field joinsINTERNAL_FIELDS.streamFlushFilestamping; storage round-trip (seq persists, increases per row, stripped fromreadRows).Checks:
npm test(1399 pass / 0 fail),npm run typecheck,npm run lintall green;cache_spool_batching+cache_lifecycle_maintenancesmokes pass; compaction-swap preservation verified.Task-Id: T1
🤖 Generated with Claude Code