feat(common): multi-network streaming query execution#2112
feat(common): multi-network streaming query execution#2112
Conversation
189c35b to
9d4f764
Compare
Implement timestamp-aligned streaming for queries spanning multiple networks. Materialized multi-network output is not included in this PR. Multi-network execution: - Branch on `blocks_tables.len()` for single vs multi-network paths - Add `blocks_table_fetch_by_timestamp()` for timestamp-keyed block lookups - Add `latest_src_watermark_multi()` for cross-network watermark consensus - Shared timestamp window: `start_ts = min(all starts)`, `end_ts = min(all ends)` - Rewind all networks on any single-network reorg - Reject `end_block` for multi-network queries in `spawn()` - Force `WatermarkColumn::Ts` for multi-network (block numbers are incomparable across networks) Integration tests: - Add `anvil_rpc_a`/`anvil_rpc_b` manifest fixtures for two-network testing - Add `set_block_timestamp_interval()` and `set_next_block_timestamp()` to Anvil fixture for deterministic block timestamps - Three tests with deterministic timestamp alignment: UNION ALL across networks, CROSS JOIN across networks, and reorg on one network triggering rewind-all behavior
9d4f764 to
457509b
Compare
| /// | ||
| /// In single-network mode this value is a block count. In multi-network mode it | ||
| /// is interpreted as a second-based interval because block numbers are | ||
| /// incomparable across networks. |
There was a problem hiding this comment.
It means what we want it to mean. Microbatch sizes are a social construct anyways 😄
Best thing would be to start thinking about making these sizes adaptive somehow.
| // The cursor stores the previous end block's timestamp, but the Delta | ||
| // filter uses an inclusive lower bound (_ts >= start). Fetch the actual | ||
| // start block to get the correct timestamp and avoid re-including the | ||
| // previous batch's rows. |
There was a problem hiding this comment.
Extra SQL queries in this path are never great. The trick we did for block numbers was to set the start as prev.number + 1, under the semantics that the start block number is not required to correspond to a block that exists. Could a similar trick apply here, prev.timestamp + 1?
I remember we rediscussed theses semantics when skipped Solana slots actually came up, but don't recall exactly where we landed. But the number + 1 code is still there 🤷
There was a problem hiding this comment.
I agree that this is not great. I'm hoping to find a way to optimize as a followup, which is part of the reason I've kept the single-network path isolated from these changes.
_block_num + 1 is still a valid lower bound for the scan, even with block-number gaps. _ts + 1 is different, because adjacent segments can legitimately share the same timestamp (especially for sub-second block times, which we normalize to whole seconds). I ran into that on local anvil chains while preparing the demo.
There was a problem hiding this comment.
Ok that makes sense.
But just in terms of readability the way the timestamp is handled in SegmentStart is quite hacky, a dead value is written in next_microbatch_start_for_network only to be overwritten here.
Implement timestamp-aligned streaming for queries spanning multiple networks. Materialized multi-network output is not included in this PR.
Multi-network execution:
blocks_tables.len()for single vs multi-network pathsblocks_table_fetch_by_timestamp()for timestamp-keyed block lookupslatest_src_watermark_multi()for cross-network watermark consensusstart_ts = min(all starts),end_ts = min(all ends)end_blockfor multi-network queries inspawn()WatermarkColumn::Tsfor multi-network (block numbers are incomparable across networks)Integration tests:
anvil_rpc_a/anvil_rpc_bmanifest fixtures for two-network testingset_block_timestamp_interval()andset_next_block_timestamp()to Anvil fixture for deterministic block timestamps