Skip to content

fix(quarantine): prevent double catalog.schema qualification of quarantine table#110

Merged
rederik76 merged 2 commits into
mainfrom
fix/quarantine-table-double-qualification
Jul 4, 2026
Merged

fix(quarantine): prevent double catalog.schema qualification of quarantine table#110
rederik76 merged 2 commits into
mainfrom
fix/quarantine-table-double-qualification

Conversation

@rederik76

@rederik76 rederik76 commented Jul 4, 2026

Copy link
Copy Markdown
Collaborator

Summary

QuarantineManager._init_quarantine built the quarantine table name from self.target,
which BaseTargetDelta.__post_init__ has already folded into a fully qualified
catalog.schema.table. It then also passed quarantineTargetDetails.database, so
TargetFactory.create(...) re-applied the catalog+schema a second time. When a dataflow set
quarantineMode: "table" with a catalog-qualified database, this produced an over-qualified
identifier such as main.schema.main.schema.orders_quarantine, and the pipeline failed with:

Table identifier '…' has N parts, but Spark Declarative Pipelines supports at most three
parts (catalog.schema.table).

Fix

In src/dataflow/quarantine.py, add conditional handling that detects a supplied quarantine
database:

  • supplied database → derive the suffixed name from the unqualified target
    (self.target.split('.')[-1]) so it is qualified exactly once, and pass the database through.
  • no database → reuse the already-qualified target name and leave database unset.
  • explicit quarantine table → used verbatim (unchanged).
    No behaviour change when quarantineTargetDetails omits database.

Changes

  • src/dataflow/quarantine.py — quarantine table identifier composition.
  • VERSIONv0.17.1v0.17.2.

Test plan

  • py_compile passes; logic verified across all four cases (supplied catalog-qualified db,
    no db, explicit table, schema-only db) — each resolves to ≤ 3 parts.
  • Feature-samples deploy + run job (_es, main, serverless) — SUCCESS
    (regression check: quarantine_table spec has no database, path unchanged).
  • [~] Pattern-samples 4-run load — runs 1 & 2 SUCCESS, runs 3–4 in progress.
  • End-to-end validation of the actual fixed branch (catalog-qualified quarantine database)
    via the tpch_sample silver pipeline, which previously reproduced the failure.

Related

  • Bug report: quarantine table double catalog.schema-qualification (see linked issue).

…ntine table

QuarantineManager derived the quarantine table name from self.target — which
BaseTargetDelta has already folded into a fully qualified catalog.schema.table —
while also passing quarantineTargetDetails.database. TargetFactory then
re-applied the database, producing an over-qualified identifier (e.g.
main.schema.main.schema.orders_quarantine) and failing the pipeline with
"Table identifier ... supports at most three parts (catalog.schema.table)".

When a quarantine database is supplied, derive the suffixed name from the
unqualified target so it is qualified exactly once; otherwise reuse the
already-qualified target name. Behaviour is unchanged when no quarantine
database is supplied (e.g. the feature-samples quarantine_table spec).

Bump version to v0.17.2.
…ne_table spec

Add a catalog-qualified `database` to the quarantine_table dataflow's
quarantineTargetDetails so the feature-samples suite exercises the supplied-
database quarantine path. This is the exact configuration that previously
produced an over-qualified table identifier; it now deploys and runs cleanly,
guarding against regressions of the double-qualification fix.
@rederik76 rederik76 merged commit 1b8170a into main Jul 4, 2026
@rederik76 rederik76 deleted the fix/quarantine-table-double-qualification branch July 4, 2026 03:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]: Quarantine table gets double catalog.schema-qualified when quarantineTargetDetails.database is set

1 participant