Skip to content

T2: cursor-aware incremental storage reads (readRows since + readRowsSince)#154

Merged
philcunliffe merged 1 commit into
integration/incremental-sink-readsfrom
task/incremental-sink-reads/T2
Jun 25, 2026
Merged

T2: cursor-aware incremental storage reads (readRows since + readRowsSince)#154
philcunliffe merged 1 commit into
integration/incremental-sink-readsfrom
task/incremental-sink-reads/T2

Conversation

@philcunliffe

Copy link
Copy Markdown
Contributor

Implements task T2 of change set incremental-sink-reads (LLP 0039/0040/0042).

Extends the kernel storage read contract so the forward and blob sinks can read only rows added since their last durable export:

  • Back-compatible opts.since on readRows (absent ⇒ byte-for-byte the existing full scan; every current caller untouched).
  • New cursor-aware readRowsSince(tablePath, { since, columns }) that pairs each internal-stripped row with its after continuation token.
  • scanRowsFromTable gains opts.since and applies the seq > since predicate as a yielded-row filter. It is deliberately NOT pushed into icebird's scan({ where }): icebird couples file/row-group pruning with a per-row match that drops nulls (null > since is false in both hyparquet and JS), which would silently skip the legacy null-seq rows the migration must preserve (LLP 0040 risk [codex] Add durable cache spool #1). The design (§2) names this yielded-row filter as the fallback; a future null-aware icebird filter can layer the file-skip optimization on top without changing the contract.
  • null-seq = new: a row whose _hyp_ingest_seq is null/absent is always yielded, so the one-time upgrade is at worst a full re-export, never silent data loss. A table that never carried the seq column yields everything.
  • after is a monotonic high-water of real seqs, so a null-seq row carries the prior watermark forward and progress never regresses even when the scan visits seqs out of order (interleaved sources; risk [codex] Add root tests and remove donor tree #3).
  • Kernel-types decl updated: SinkContinuation, ReadRowsOptions, extended readRows, new readRowsSince.

Checks

  • npm test — 1404 pass / 1 skip / 0 fail
  • npm run typecheck — clean
  • npm run lint — clean

New tests (test/core/sink-reads-since.test.js) cover back-compat, after-token monotonicity, no-new-rows ≈0, incremental new rows, the null-seq migration contract, pure-legacy (no seq column) tables, and invalid-token rejection.

Task-Id: T2

🤖 Generated with Claude Code

Add a back-compatible `opts.since` to `readRows` and a cursor-aware
`readRowsSince` sibling that pairs each internal-stripped row with its
`after` continuation token, so the forward and blob sinks can read only
rows added since their last durable export.

- `scanRowsFromTable` gains an `opts.since` (bigint `_hyp_ingest_seq`
  watermark) and applies a `seq > since` predicate as a yielded-row
  filter. It is NOT pushed into icebird's `scan({ where })`: icebird
  couples file/row-group pruning with a per-row match that drops nulls
  (`null > since` is false), which would silently skip the legacy
  null-seq rows the migration must preserve (LLP 0040 risk #1). The
  design names this yielded-row filter as the fallback; a future
  null-aware icebird filter can add the file-skip optimization on top.
- null-seq = new: a row whose `_hyp_ingest_seq` is null/absent
  (pre-upgrade) is always yielded, so the one-time migration is at worst
  a full re-export, never silent data loss. A table that never carried
  the seq column yields everything.
- `after` is a monotonic high-water of real seqs, so a null-seq row
  carries the prior watermark forward and progress never regresses even
  when the scan visits seqs out of order (interleaved sources).
- `opts` absent ⇒ byte-for-byte the pre-existing full scan, so every
  current caller is untouched until it opts in.
- Update the kernel-types decl: `SinkContinuation`, `ReadRowsOptions`,
  the extended `readRows`, and the new `readRowsSince`.

Tests cover back-compat, after-token monotonicity, no-new-rows ≈0,
incremental new rows, the null-seq migration contract, the pure-legacy
(no seq column) table, and invalid-token rejection.

Task-Id: T2

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@philcunliffe philcunliffe merged commit 7db714e into integration/incremental-sink-reads Jun 25, 2026
6 checks passed
@philcunliffe philcunliffe deleted the task/incremental-sink-reads/T2 branch June 25, 2026 23:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant