You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During the Built-On demo prep (May 18–19, 2026) on a Lakebase / PostgreSQL backend
we hit:
DataJointError: Cannot join on attribute 'image_id': different lineages
(blob_detection.image.image_id vs None). Use .proj() to rename one of the attributes.
on the very first populate() of a dj.Imported / dj.Computed table whose PK was
fully FK-inherited from a single parent. The semantic-check is correct in principle:
both sides of a populate antijoin should agree on the source-table lineage of the
inherited PK attribute. After a clean schema drop + redeclare on the same workspace,
the bug went away — Image.proj().heading["image_id"].lineage came back equal to ImageSpec.proj().heading["image_id"].lineage, both 'blob_detection.image_spec.image_id'.
This pattern is concerning because:
The user can't tell from the error message whether the issue is their schema
(e.g., a real lineage mismatch the semantic-check exists to catch) or a stale
internal record they have no obvious way to inspect.
The recovery (schema.rebuild_lineage() or full drop+redeclare) is undocumented
in the error.
Originally I (incorrectly) diagnosed this as a missing fix in the populate-antijoin
code path and opened fix: Disable semantic_check on populate antijoin (parallels #1383) #1453 to disable semantic_check; closed once we found the
lineage table itself had been the issue. The right fix lives in declaration /
rebuild logic, not in the antijoin.
Speculation: what could leave stale entries in ~lineage?
These are speculations, not verified causes — flagged for investigation:
DJ version skew across the table's history. A table is first declared under
a DJ version whose declaration logic writes a lineage row, then later the user
upgrades DJ — the new code expects different lineage-string formatting (e.g.,
the pre-6506badb PostgreSQL-double-quote situation we hit in January) and the
live row no longer matches what proj() computes at query time. The table
itself looks healthy via information_schema; only ~lineage is stale.
Partial declare on PostgreSQL where the lineage insert raises silently or
gets rolled back independently of the DDL. Lakebase + PG 17 may have quirks here
that MySQL doesn't.
Schema-definition changes that don't re-touch the table. Today's @schema
decorator path may skip re-writing ~lineage if the table already exists
(is_declared == True). If the in-code definition diverges from what's stored
— e.g., a FK retargeting that doesn't change column types and so passes the
schema-equivalence check — ~lineage keeps the old parent linkage forever.
Manual DROP TABLE outside DataJoint that leaves orphan ~lineage rows
pointing at a no-longer-existent table.
Multiple schemas with overlapping table names during dev (we drop+redeclare
many times during this kind of work). If the ~lineage cleanup keys by table
identity in a way that differs from the redeclare's identity, ghost rows can
accumulate.
Proposed direction
Even without pinning the root cause, the user-visible failure is bad enough to
warrant defensive declaration logic:
Short term (declaration robustness)
On every @schema decoration of a table (whether is_declared is True or False), clear and re-insert the table's rows in ~lineage from the current FK
definition. Make declaration idempotent for ~lineage, the same way it is for
the actual table structure.
Add a fast post-declare consistency check (or invariant assertion under safemode=True): for each PK attribute, the lineage recorded in ~lineage
matches the lineage that Table.proj() would compute. If not, log a warning
pointing the user at schema.rebuild_lineage().
Medium term (better error)
When the semantic-check on the populate antijoin fails with a None lineage on
one side — which is by construction either a freshly-declared-but-not-yet-saved
table or a stale row — surface a tailored error:
Cannot join on attribute 'X': lineage missing on <table_name>. This usually
indicates a stale `~lineage` entry from an older DataJoint version or an
incomplete redeclare. Run:
schema.rebuild_lineage()
to recompute from current FK definitions.
Long term (schema versioning)
Tag every ~lineage row with the DJ minor version that wrote it. On read,
if the row's version is older than a known floor (e.g., the version that
introduced clean PG-quote stripping), trigger an in-place rebuild for that
row before returning it. This keeps the cost off the hot path for current-version
schemas while making upgrades self-healing.
Repro vector (for whoever picks this up)
We don't have a deterministic reproducer because the failing state on the workspace
is gone (cleaned via drop+redeclare). But a likely synthetic repro:
Apply 6506badb
in reverse (or check out master before that SHA) and declare a schema with an
FK against PostgreSQL.
Inspect ~lineage — entries should carry double-quotes inside the lineage
string ("db"."table".attr rather than db.table.attr).
Upgrade DJ to master, don't drop the schema. Try populate() on a child
table — semantic-check failure should reappear.
Confirm schema.rebuild_lineage() clears it; confirm a full schema redeclare
clears it; confirm that just re-running @schema(Table)without recomputing
the row does not clear it.
What was not the cause (closing #1453's misdirection)
The populate antijoin's semantic-check at (self._jobs_to_do(restrictions) - self.proj()) is the right design. Disabling
it (as #1453 proposed) would silence the entire class of correctness bugs that #1405 added it to catch. This issue is specifically about making the lineage table trustworthy, not about loosening the check that reads it.
Problem
During the Built-On demo prep (May 18–19, 2026) on a Lakebase / PostgreSQL backend
we hit:
on the very first
populate()of adj.Imported/dj.Computedtable whose PK wasfully FK-inherited from a single parent. The semantic-check is correct in principle:
both sides of a populate antijoin should agree on the source-table lineage of the
inherited PK attribute. After a clean schema drop + redeclare on the same workspace,
the bug went away —
Image.proj().heading["image_id"].lineagecame back equal toImageSpec.proj().heading["image_id"].lineage, both'blob_detection.image_spec.image_id'.This pattern is concerning because:
(e.g., a real lineage mismatch the semantic-check exists to catch) or a stale
internal record they have no obvious way to inspect.
schema.rebuild_lineage()or full drop+redeclare) is undocumentedin the error.
code path and opened fix: Disable semantic_check on populate antijoin (parallels #1383) #1453 to disable semantic_check; closed once we found the
lineage table itself had been the issue. The right fix lives in declaration /
rebuild logic, not in the antijoin.
Speculation: what could leave stale entries in
~lineage?These are speculations, not verified causes — flagged for investigation:
DJ version skew across the table's history. A table is first declared under
a DJ version whose declaration logic writes a lineage row, then later the user
upgrades DJ — the new code expects different lineage-string formatting (e.g.,
the pre-
6506badbPostgreSQL-double-quote situation we hit in January) and thelive row no longer matches what
proj()computes at query time. The tableitself looks healthy via
information_schema; only~lineageis stale.Partial declare on PostgreSQL where the lineage insert raises silently or
gets rolled back independently of the DDL. Lakebase + PG 17 may have quirks here
that MySQL doesn't.
Schema-definition changes that don't re-touch the table. Today's
@schemadecorator path may skip re-writing
~lineageif the table already exists(
is_declared == True). If the in-code definition diverges from what's stored— e.g., a FK retargeting that doesn't change column types and so passes the
schema-equivalence check —
~lineagekeeps the old parent linkage forever.Manual
DROP TABLEoutside DataJoint that leaves orphan~lineagerowspointing at a no-longer-existent table.
Multiple schemas with overlapping table names during dev (we drop+redeclare
many times during this kind of work). If the
~lineagecleanup keys by tableidentity in a way that differs from the redeclare's identity, ghost rows can
accumulate.
Proposed direction
Even without pinning the root cause, the user-visible failure is bad enough to
warrant defensive declaration logic:
Short term (declaration robustness)
@schemadecoration of a table (whetheris_declaredis True or False),clear and re-insert the table's rows in
~lineagefrom the current FKdefinition. Make declaration idempotent for
~lineage, the same way it is forthe actual table structure.
safemode=True): for each PK attribute, the lineage recorded in~lineagematches the lineage that
Table.proj()would compute. If not, log a warningpointing the user at
schema.rebuild_lineage().Medium term (better error)
When the semantic-check on the populate antijoin fails with a
Nonelineage onone side — which is by construction either a freshly-declared-but-not-yet-saved
table or a stale row — surface a tailored error:
Long term (schema versioning)
Tag every
~lineagerow with the DJ minor version that wrote it. On read,if the row's version is older than a known floor (e.g., the version that
introduced clean PG-quote stripping), trigger an in-place rebuild for that
row before returning it. This keeps the cost off the hot path for current-version
schemas while making upgrades self-healing.
Repro vector (for whoever picks this up)
We don't have a deterministic reproducer because the failing state on the workspace
is gone (cleaned via drop+redeclare). But a likely synthetic repro:
6506badbin reverse (or check out master before that SHA) and declare a schema with an
FK against PostgreSQL.
~lineage— entries should carry double-quotes inside the lineagestring (
"db"."table".attrrather thandb.table.attr).populate()on a childtable — semantic-check failure should reappear.
schema.rebuild_lineage()clears it; confirm a full schema redeclareclears it; confirm that just re-running
@schema(Table)without recomputingthe row does not clear it.
What was not the cause (closing #1453's misdirection)
The populate antijoin's semantic-check at
(self._jobs_to_do(restrictions) - self.proj())is the right design. Disablingit (as #1453 proposed) would silence the entire class of correctness bugs that
#1405 added it to catch. This issue is specifically about making the
lineage table trustworthy, not about loosening the check that reads it.