Update ingestion pipeline to handle timeseries table#528
Conversation
Not up to standards ⛔🔴 Issues
|
| Category | Results |
|---|---|
| Compatibility | 3 high |
| CodeStyle | 5 minor |
| Complexity | 2 medium |
🟢 Metrics 70 complexity · 6 duplication
Metric Results Complexity 70 Duplication 6
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
7058981 to
65a60e7
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request migrates the Spanner storage schema and ingestion pipeline from a protobuf-based Observations column to a relational model featuring a TimeSeries table (storing metadata as JSON) and an interleaved Observation table for individual data points. Feedback focuses on performance and maintainability improvements: optimizing GraphReader.extractObservations by constructing the TimeSeriesKey once outside the loop rather than for every data point, refactoring SpannerClient.toObservationKVMutations to reuse toTimeSeriesMutation to eliminate duplicate mutation-building logic, and simplifying toObservation to accept TimeSeriesKey directly while updating the corresponding tests.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
29592ac to
f8292be
Compare
n-h-diaz
left a comment
There was a problem hiding this comment.
Thanks!
I had a few questions mostly relating to what the final schema looks like (I was following the schema in datcom-store.dc-graph-staging.dc_graph)
DataCommons Spanner Graph is migrating to a new schema to introduce separate TimeSeries and Observation tables instead of the currently Observation table. This allows removing the Observation proto used to store the timeseries. This PR updates the ingestion pipeline to populate data into the tables with the new schema. For this, we eliminate the step to build the optimized graph (timeseries) from the input observations. Instead, we directly create timeseries and observation mutations from the input observation. This avoids the costly group by operation on the observations.