Context
Depends on: #741
Allele rows at each level are real variants that can be independently annotated. Because Allele rows are shared across score sets by VRS digest, annotation results are stored once per allele and shared across all score sets that reference it. Annotation jobs must check for an existing current annotation before re-annotating.
Annotation architecture
AnnotationStatus is QC/audit only — it tracks whether the pipeline ran and whether it succeeded. The actual annotation data lives in dedicated per-type tables (VEPAnnotation, GnomADVariant, ClinicalControl), each with superseded_at for temporal support.
When a new annotation is produced for an allele that already has a current annotation of that type, the existing row's superseded_at is set to NOW() in the same transaction that inserts the new row. This ensures the temporal query WHERE created_at <= T AND (superseded_at IS NULL OR superseded_at > T) always returns exactly one result for any point in time.
Level-to-annotation routing
All routing is driven by the level column on the flat alleles table:
- gnomAD allele frequency →
level = 'genomic' → stored in GnomADVariant
- VEP functional consequence →
level = 'genomic' or level = 'coding' → stored in VEPAnnotation
- ClinVar / clinical controls →
level = 'coding' or level = 'protein' → stored in ClinicalControl
- ClinGen allele ID → stored as
clingen_allele_id column on the alleles table (stable, no temporal table needed)
Acceptance Criteria
- VEP annotation job creates
VEPAnnotation rows linked by allele_id; sets superseded_at on the previous current row in the same transaction
- gnomAD annotation job creates
GnomADVariant rows linked by allele_id; sets superseded_at on the previous current row in the same transaction
- ClinVar annotation job creates
ClinicalControl rows linked by allele_id; sets superseded_at on the previous current row
AnnotationStatus rows are created per job run per allele_id for QC tracking only
- Annotation jobs skip alleles that already have a current annotation of the same type and source version (cross-score-set deduplication)
- Temporal query pattern is verified: querying annotation state at a past timestamp returns the correct historical annotation
- Existing
AssayedVariant-level annotation behavior is unchanged
Context
Depends on: #741
Allelerows at each level are real variants that can be independently annotated. BecauseAllelerows are shared across score sets by VRS digest, annotation results are stored once per allele and shared across all score sets that reference it. Annotation jobs must check for an existing current annotation before re-annotating.Annotation architecture
AnnotationStatusis QC/audit only — it tracks whether the pipeline ran and whether it succeeded. The actual annotation data lives in dedicated per-type tables (VEPAnnotation,GnomADVariant,ClinicalControl), each withsuperseded_atfor temporal support.When a new annotation is produced for an allele that already has a current annotation of that type, the existing row's
superseded_atis set toNOW()in the same transaction that inserts the new row. This ensures the temporal queryWHERE created_at <= T AND (superseded_at IS NULL OR superseded_at > T)always returns exactly one result for any point in time.Level-to-annotation routing
All routing is driven by the
levelcolumn on the flatallelestable:level = 'genomic'→ stored inGnomADVariantlevel = 'genomic'orlevel = 'coding'→ stored inVEPAnnotationlevel = 'coding'orlevel = 'protein'→ stored inClinicalControlclingen_allele_idcolumn on theallelestable (stable, no temporal table needed)Acceptance Criteria
VEPAnnotationrows linked byallele_id; setssuperseded_aton the previous current row in the same transactionGnomADVariantrows linked byallele_id; setssuperseded_aton the previous current row in the same transactionClinicalControlrows linked byallele_id; setssuperseded_aton the previous current rowAnnotationStatusrows are created per job run perallele_idfor QC tracking onlyAssayedVariant-level annotation behavior is unchanged