Skip to content

fix: bq depsdev deadlock and a couple of fixes and optimizations#4238

Open
themarolt wants to merge 22 commits into
mainfrom
fix/bq-depsdev-deadlock
Open

fix: bq depsdev deadlock and a couple of fixes and optimizations#4238
themarolt wants to merge 22 commits into
mainfrom
fix/bq-depsdev-deadlock

Conversation

@themarolt

@themarolt themarolt commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes several issues discovered during the first full bootstrap of package_dependencies at scale (~1B rows): a deadlock risk in index management, premature job completion before indexes finished rebuilding, monitor display bugs, and missing version_constraint data for ecosystems loaded via the
cheaper BQ table (option B). Also adds workflow monitoring to the deploy scripts and a one-shot dedup script for manual recovery.

Changes

  • Parallel index builds: secondary indexes on package_dependencies and versions now build concurrently instead of sequentially — cuts index rebuild time significantly on partitioned tables
  • Parallel dedup: cross-chunk duplicate removal on package_dependencies runs 8 partitions at a time instead of 1, reducing wall-clock from ~10h to ~1-2h
  • Premature done fix: full-load jobs no longer mark themselves done on the last merge chunk — isFinal: true is deferred until after index rebuild and constraint validation complete
  • 24h activity timeout: rebuildPackageDepsIndexes and rebuildPackageDepsConstraints timeout raised from 12h to 24h to cover worst-case parallel dedup on 1B+ rows
  • --fill-constraints mode: new trigger flag re-exports BQ deps data and upserts version_constraint only where NULL — for ecosystems initially loaded via option B (no version constraint)
  • dedupPackageDeps script: standalone script to manually run dedup + UNIQUE constraint rebuild on package_dependencies without re-running the full workflow
  • Monitor fixes: removed non-existent ecosystems column from query, fixed column truncation causing adjacent columns to jam, added scroll support for long job lists
  • Deploy script monitoring: deploy-staging and deploy-production now wait for the triggered GitHub Actions workflow and report success/failure
  • GO/NUGET BQ table fix: GO and NUGET deps come from ecosystem-specific tables (GoRequirementsLatest, NuGetRequirementsLatest) — they were previously missing or misconfigured
  • setJobStep activity: extracted step tracking into a reusable activity with proper timeout/retry config; added step tracking across more workflow phases

Type of change

  • Bug fix
  • New feature
  • Performance improvement

Note

High Risk
Changes affect production-scale package dependency loads, constraint rebuilds, and concurrent DB writes on package_dependencies/repos; mis-timed job completion or dedup could leave tables without constraints or with bad data until recovery scripts run.

Overview
Hardens ~1B-row package_dependencies full bootstrap: parallel secondary index builds and 8-way partition dedup (with higher work_mem), 24h rebuild activity timeouts, and jobs that stay non-terminal until index/constraint rebuild finishes—not on the last merge chunk. Adds --fill-constraints / MERGE_SQL_FILL_CONSTRAINTS to backfill version_constraint after Option B loads, plus dedup-package-deps for manual dedup + UNIQUE rebuild.

Deps BQ SQL now unions ecosystem-specific sources: graph/edges tables for NPM/MAVEN/PYPI/CARGO and GoRequirements* / NuGetRequirements* for GO/NUGET; ecosystem arrays flow through exports as meta:ecosystems. setJobStep + mergeJobTableRowCounts expose phase labels in monitor:osspckgs (stuck-job detection, scroll). Scorecard repo updates use ordered FOR UPDATE to avoid deadlocks with the GitHub enricher. Deploy staging/production gh run watch after workflow trigger. ADR-0003 documents GO/NUGET absent from graph tables.

Reviewed by Cursor Bugbot for commit 26f0680. Bugbot is set up for automated code reviews on this repo. Configure here.

themarolt added 12 commits June 17, 2026 12:06
Signed-off-by: Uroš Marolt <uros@marolt.me>
Signed-off-by: Uroš Marolt <uros@marolt.me>
Signed-off-by: Uroš Marolt <uros@marolt.me>
Signed-off-by: Uroš Marolt <uros@marolt.me>
Signed-off-by: Uroš Marolt <uros@marolt.me>
Signed-off-by: Uroš Marolt <uros@marolt.me>
Signed-off-by: Uroš Marolt <uros@marolt.me>
Signed-off-by: Uroš Marolt <uros@marolt.me>
Signed-off-by: Uroš Marolt <uros@marolt.me>
Signed-off-by: Uroš Marolt <uros@marolt.me>
Signed-off-by: Uroš Marolt <uros@marolt.me>
Signed-off-by: Uroš Marolt <uros@marolt.me>
Copilot AI review requested due to automatic review settings June 19, 2026 09:57
@github-actions

Copy link
Copy Markdown
Contributor

⚠️ Jira Issue Key Missing

Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability.

Example:

  • feat: add user authentication (CM-123)
  • feat: add user authentication (IN-123)

Projects:

  • CM: Community Data Platform
  • IN: Insights

Please add a Jira issue key to your PR title.

Comment thread services/apps/packages_worker/src/deps-dev/workflows/ingestDependencies.ts Outdated
Comment thread services/apps/packages_worker/src/deps-dev/queries/depsSql.ts

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Not ready to approve

There are a few concrete issues in the changed code (notably workflow-run detection in scripts/cli and a missing maintenance_work_mem application that the code comments rely on) that should be corrected before merge.

Pull request overview

This PR hardens and speeds up the deps.dev BigQuery → Postgres bootstrap pipeline at very large scale (≈1B package_dependencies rows), addressing deadlock risk, premature “done” marking, missing version_constraint backfills, and improving operational visibility (job-step tracking, monitor UI fixes, deploy script workflow monitoring).

Changes:

  • Adds step/meta tracking to ingest jobs (meta:step, meta:ecosystems) and improves the terminal monitor UX (stuck detection, scroll, column sizing, truncation fixes).
  • Improves full-load correctness + performance (parallel index builds, parallel dedup, defers isFinal: true until after rebuild/validation; adds --fill-constraints backfill mode).
  • Adds operational tooling: deploy workflow monitoring in scripts/cli and a one-shot dedup-package-deps recovery script.
File summaries
File Description
services/libs/data-access-layer/src/osspckgs/ingestJobs.ts Expands tableRowCounts typing and adds helper to merge display/meta keys into table_row_counts.
services/apps/packages_worker/src/scripts/triggerBootstrap.ts Adds --fill-constraints flag and passes it into the Temporal bootstrap workflow.
services/apps/packages_worker/src/scripts/monitorOsspckgs.ts Enhances job list display (step labels, stuck highlighting, scrolling, truncation behavior, ecosystem extraction).
services/apps/packages_worker/src/scripts/exportToBucket.ts Adjusts deps export SQL generation to use the new deps SQL API + default ecosystems.
services/apps/packages_worker/src/scripts/dedupPackageDeps.ts New standalone script to dedup cross-chunk duplicates and rebuild the UNIQUE constraint on package_dependencies.
services/apps/packages_worker/src/scorecard/workflows/ingestScorecard.ts Updates merge SQL to use ordered row locking to avoid deadlocks with concurrent repos updates.
services/apps/packages_worker/src/deps-dev/workflows/ingestVersions.ts Defers “done” until after index/constraint rebuild, adds step tracking, and tweaks BQ max-bytes for full loads.
services/apps/packages_worker/src/deps-dev/workflows/ingestRepos.ts Propagates ecosystems context into export activity metadata.
services/apps/packages_worker/src/deps-dev/workflows/ingestPackages.ts Propagates ecosystems context into export activity metadata.
services/apps/packages_worker/src/deps-dev/workflows/ingestDependentCounts.ts Adds step tracking before guard checks.
services/apps/packages_worker/src/deps-dev/workflows/ingestDependencies.ts Adds --fill-constraints mode, extends timeouts, defers finalization for full-load rebuild phases, and adds step tracking.
services/apps/packages_worker/src/deps-dev/workflows/ingestAdvisories.ts Adjusts BQ max-bytes and propagates ecosystems context into export activity metadata.
services/apps/packages_worker/src/deps-dev/workflows/bootstrapOsspckgs.ts Wires fillConstraints option through the top-level bootstrap workflow.
services/apps/packages_worker/src/deps-dev/queries/depsSql.ts Refactors deps SQL generation to support GO/NUGET ecosystem-specific tables + new full/incremental builders.
services/apps/packages_worker/src/deps-dev/activities/setJobStep.ts New activity to write step state into ingest job metadata (meta:step).
services/apps/packages_worker/src/deps-dev/activities/manageVersionsIndexes.ts Builds secondary indexes in parallel (per-connection) and performs dedup + UNIQUE rebuild.
services/apps/packages_worker/src/deps-dev/activities/managePackageDepsIndexes.ts Builds secondary indexes in parallel and runs parallel partitioned dedup to speed up UNIQUE rebuild.
services/apps/packages_worker/src/deps-dev/activities/index.ts Exports the new setJobStep activity.
services/apps/packages_worker/src/deps-dev/activities/bqExportToGcs.ts Adds optional ecosystems metadata to ingest-job table_row_counts for monitor display/diagnostics.
services/apps/packages_worker/src/criticality/activities.ts Marks the ingest job status as merging before ranking merge.
services/apps/packages_worker/package.json Adds dedup-package-deps scripts (and local variant).
scripts/cli Adds deploy workflow-run monitoring via gh run watch and conclusion reporting.
docs/adr/README.md Updates ADR-0003 title/summary to reflect GO + NUGET trigger condition.
docs/adr/0003-deps-bq-table-selection.md Updates ADR-0003 wording to include GO as well as NUGET in the decision/trigger language.

Copilot's findings

  • Files reviewed: 24/24 changed files
  • Comments generated: 4

Note

Your feedback helps us improve the quality of this feature.
Please use 👍 or 👎 to tell us whether this assessment is correct.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread services/libs/data-access-layer/src/osspckgs/ingestJobs.ts
Comment thread scripts/cli Outdated
Comment thread scripts/cli
Copilot AI review requested due to automatic review settings June 19, 2026 11:31
Comment thread scripts/cli

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Not ready to approve

The new dedup-package-deps recovery script can hang or silently no-op with invalid/missing --concurrency, and there are correctness/consistency fixes needed in the new DAL JSONB merge helper and the versions-index rebuild performance changes.

Copilot's findings
  • Files reviewed: 24/24 changed files
  • Comments generated: 3

Note

Your feedback helps us improve the quality of this feature.
Please use 👍 or 👎 to tell us whether this assessment is correct.

Comment thread services/apps/packages_worker/src/scripts/dedupPackageDeps.ts
Comment thread services/apps/packages_worker/src/deps-dev/activities/manageVersionsIndexes.ts Outdated
Comment thread services/libs/data-access-layer/src/osspckgs/ingestJobs.ts
Copilot AI review requested due to automatic review settings June 19, 2026 12:01

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Human review recommended

It changes core billion-row ingest/index/constraint rebuild behavior and includes at least one confirmed SQL/DB execution bug (transaction-scoped SET LOCAL) that should be fixed and revalidated before approval.

Copilot's findings
  • Files reviewed: 24/24 changed files
  • Comments generated: 2

Note

Your feedback helps us improve the quality of this feature.
Please use 👍 or 👎 to tell us whether this assessment is correct.

Comment thread services/libs/data-access-layer/src/osspckgs/ingestJobs.ts
Copilot AI review requested due to automatic review settings June 19, 2026 12:15

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Human review recommended

It changes multiple high-risk billion-row ingestion and Postgres maintenance paths and still has open correctness/operability concerns (job metadata consistency, fill-constraints update semantics, and memory-pressure tuning for parallel maintenance).

Copilot's findings
  • Files reviewed: 24/24 changed files
  • Comments generated: 6

Note

Your feedback helps us improve the quality of this feature.
Please use 👍 or 👎 to tell us whether this assessment is correct.

Comment thread services/apps/packages_worker/src/scripts/dedupPackageDeps.ts
Copilot AI review requested due to automatic review settings June 19, 2026 15:52
Signed-off-by: Uroš Marolt <uros@marolt.me>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Human review recommended

It changes high-impact ingestion/DDL behavior at 1B-row scale and includes at least one correctness guard that should be addressed before merge.

Copilot's findings
  • Files reviewed: 24/24 changed files
  • Comments generated: 1

Note

Your feedback helps us improve the quality of this feature.
Please use 👍 or 👎 to tell us whether this assessment is correct.

Comment thread services/apps/packages_worker/src/deps-dev/queries/depsSql.ts
Signed-off-by: Uroš Marolt <uros@marolt.me>
Copilot AI review requested due to automatic review settings June 19, 2026 16:11

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 26f0680. Configure here.

Comment thread services/apps/packages_worker/src/scripts/monitorOsspckgs.ts

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Human review recommended

Changes alter production-scale ingestion/maintenance behavior (parallel index/dedup, job finalization semantics, and new fill mode) and include operational-risk knobs that should be validated and tuned via follow-up fixes.

Copilot's findings
  • Files reviewed: 24/24 changed files
  • Comments generated: 5

Note

Your feedback helps us improve the quality of this feature.
Please use 👍 or 👎 to tell us whether this assessment is correct.

Comment thread services/apps/packages_worker/src/scripts/triggerBootstrap.ts
Comment thread services/apps/packages_worker/src/scripts/monitorOsspckgs.ts
Comment thread services/apps/packages_worker/src/scripts/dedupPackageDeps.ts
@themarolt themarolt requested a review from mbani01 June 19, 2026 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants