Skip to content

T5: wire core blob sink (local-fs + s3) to incremental readRowsSince#157

Merged
philcunliffe merged 1 commit into
integration/incremental-sink-readsfrom
task/incremental-sink-reads/T5
Jun 26, 2026
Merged

T5: wire core blob sink (local-fs + s3) to incremental readRowsSince#157
philcunliffe merged 1 commit into
integration/incremental-sink-readsfrom
task/incremental-sink-reads/T5

Conversation

@philcunliffe

Copy link
Copy Markdown
Contributor

Implements task T5 of the incremental-sink-reads change set (LLP 0042 plan; LLP 0040 design §4).

Wires the core blob sink (local-fs + s3 destinations) to the cursor-aware readRowsSince read surface (T2) and the persisted per-(sink instance, partition) watermark store (T3):

  • reads only rows with _hyp_ingest_seq > watermark each tick;
  • skips writing when there are no new rows (no blob, ~0 bytes);
  • embeds the [sinceSeq, lastSeq] range in the output filename / object key so a crash-retry re-PUTs the same key (idempotent overwrite — the blob sink's stand-in for the server idempotency ledger);
  • advances the watermark only after the durable write/PUT.

New shared helpers in src/core/sinks/incremental.js (exported via hypaware/core/sinks): openIncrementalRows, withSeqRangeFilename, watermarkKeyFor, createInstanceWatermarkStore. Because PluginPaths.stateDir is per-plugin (not per sink instance), the watermark store is scoped under the instance to honor the design's per-(sink instance, partition) contract.

Tests

  • test/core/sink-incremental.test.js — helper unit tests (filename range, peek/empty, since filter, null-seq carry-forward, instance isolation).
  • test/plugins/local-fs-incremental-export.test.js — end-to-end: ranged filename, watermark advance, skip-empty, new range, cumulative count.
  • test/plugins/s3-export-batch.test.js — rewritten for the new behavior (skip-empty, ranged key, watermark advance, idempotent re-PUT on lost watermark) while preserving the prior retry-semantics coverage.

npm test (1433 pass / 1 pre-existing skip / 0 fail), npm run typecheck, and npm run lint all pass.

Note: the two local-fs blob-sink smokes (blob_sink_parquet_local_fs, local_parquet_export) are pre-existing red on the integration branch for an unrelated reason — the sink driver hands the blob sink the drained spool path (dataset/all) rather than the committed source=<...> partition, so 0 rows are read regardless. Verified by running each on the untouched base branch. Their filename assertions were updated to match the ranged filename so they will pass once that upstream driver/discovery issue is resolved.

Task-Id: T5

🤖 Generated with Claude Code

Wire the local-fs and s3 blob destinations to the cursor-aware
readRowsSince surface (T2) and the persisted per-(sink instance,
partition) watermark store (T3), so each tick exports only rows added
since the sink's last durable PUT.

- New shared helpers in src/core/sinks/incremental.js (exported via
  hypaware/core/sinks): openIncrementalRows (peek-to-decide-empty,
  self-tracking rowCount + high-water lastAfter, feeds the unchanged
  encoder.encodePartition contract), withSeqRangeFilename (embeds
  [sinceSeq,lastSeq] before the extension), watermarkKeyFor, and
  createInstanceWatermarkStore.
- Empty new-row set writes no blob (skip, 0 bytes).
- The output filename/object key embeds the [sinceSeq,lastSeq] range so a
  crash-retry re-PUTs the same key (idempotent overwrite) — the blob
  sink's stand-in for the central server ledger.
- The watermark advances only after the durable write/PUT.
- PluginPaths.stateDir is per-plugin, not per sink instance, so the
  watermark store is scoped under the instance to honor the per-(sink
  instance, partition) contract.
- Tests: helper unit tests, a local-fs end-to-end incremental test
  (ranged filename, watermark advance, skip-empty, new range, cumulative
  count), and rewritten s3-export-batch tests (skip-empty, ranged key,
  watermark advance, idempotent re-PUT on lost watermark) preserving the
  prior retry-semantics coverage.
- Updated the two local-fs blob-sink smokes to match the ranged filename
  (note: both are pre-existing red on the integration branch for an
  unrelated reason — the driver hands the sink the drained spool path).

Task-Id: T5
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@philcunliffe philcunliffe merged commit 4e9f386 into integration/incremental-sink-reads Jun 26, 2026
6 checks passed
@philcunliffe philcunliffe deleted the task/incremental-sink-reads/T5 branch June 26, 2026 00:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant