T2: cursor-aware incremental storage reads (readRows since + readRowsSince)#154
Merged
philcunliffe merged 1 commit intoJun 25, 2026
Conversation
Add a back-compatible `opts.since` to `readRows` and a cursor-aware
`readRowsSince` sibling that pairs each internal-stripped row with its
`after` continuation token, so the forward and blob sinks can read only
rows added since their last durable export.
- `scanRowsFromTable` gains an `opts.since` (bigint `_hyp_ingest_seq`
watermark) and applies a `seq > since` predicate as a yielded-row
filter. It is NOT pushed into icebird's `scan({ where })`: icebird
couples file/row-group pruning with a per-row match that drops nulls
(`null > since` is false), which would silently skip the legacy
null-seq rows the migration must preserve (LLP 0040 risk #1). The
design names this yielded-row filter as the fallback; a future
null-aware icebird filter can add the file-skip optimization on top.
- null-seq = new: a row whose `_hyp_ingest_seq` is null/absent
(pre-upgrade) is always yielded, so the one-time migration is at worst
a full re-export, never silent data loss. A table that never carried
the seq column yields everything.
- `after` is a monotonic high-water of real seqs, so a null-seq row
carries the prior watermark forward and progress never regresses even
when the scan visits seqs out of order (interleaved sources).
- `opts` absent ⇒ byte-for-byte the pre-existing full scan, so every
current caller is untouched until it opts in.
- Update the kernel-types decl: `SinkContinuation`, `ReadRowsOptions`,
the extended `readRows`, and the new `readRowsSince`.
Tests cover back-compat, after-token monotonicity, no-new-rows ≈0,
incremental new rows, the null-seq migration contract, the pure-legacy
(no seq column) table, and invalid-token rejection.
Task-Id: T2
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
7db714e
into
integration/incremental-sink-reads
6 checks passed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implements task T2 of change set
incremental-sink-reads(LLP 0039/0040/0042).Extends the kernel storage read contract so the forward and blob sinks can read only rows added since their last durable export:
opts.sinceonreadRows(absent ⇒ byte-for-byte the existing full scan; every current caller untouched).readRowsSince(tablePath, { since, columns })that pairs each internal-stripped row with itsaftercontinuation token.scanRowsFromTablegainsopts.sinceand applies theseq > sincepredicate as a yielded-row filter. It is deliberately NOT pushed into icebird'sscan({ where }): icebird couples file/row-group pruning with a per-row match that drops nulls (null > sinceis false in both hyparquet and JS), which would silently skip the legacy null-seq rows the migration must preserve (LLP 0040 risk [codex] Add durable cache spool #1). The design (§2) names this yielded-row filter as the fallback; a future null-aware icebird filter can layer the file-skip optimization on top without changing the contract._hyp_ingest_seqis null/absent is always yielded, so the one-time upgrade is at worst a full re-export, never silent data loss. A table that never carried the seq column yields everything.afteris a monotonic high-water of real seqs, so a null-seq row carries the prior watermark forward and progress never regresses even when the scan visits seqs out of order (interleaved sources; risk [codex] Add root tests and remove donor tree #3).SinkContinuation,ReadRowsOptions, extendedreadRows, newreadRowsSince.Checks
npm test— 1404 pass / 1 skip / 0 failnpm run typecheck— cleannpm run lint— cleanNew tests (
test/core/sink-reads-since.test.js) cover back-compat, after-token monotonicity, no-new-rows ≈0, incremental new rows, the null-seq migration contract, pure-legacy (no seq column) tables, and invalid-token rejection.Task-Id: T2
🤖 Generated with Claude Code