Summary
The API test suite has no agreed way to construct DB-backed domain objects. It leans on pytest fixture cascades, which work for infrastructure but scale poorly for the domain's many related entities. This issue is a decision: adopt either factory_boy (SQLAlchemy model factories with subfactories) or explicit scenario-builder functions, then roll it out incrementally. Fixtures stay for genuine infrastructure; this is specifically about domain object construction.
Problem
DB-backed setup currently flows through cascade fixtures (setup_lib_db → setup_lib_db_with_score_set → setup_lib_db_with_variant). The cascade:
- Hides state. A leaf fixture implies an unseen chain of entities (a variant fixture silently pulls in a score set, experiment, experiment set, users, licenses). The test signature doesn't reveal what world it runs in.
- Fits poorly off the happy path. When a test needs "most of scaffold A but one piece different," you either accept excess/irrelevant state or spawn yet another cascade variant.
- Proliferates. Each new shape becomes another fixture, and the construction logic gets reinvented per directory.
This is already visible in the recent annotation-pipeline allele refactor: two test files written together invented allele creation two different ways — tests/models/test_annotation_event_model.py defines an allele pytest fixture, while tests/models/test_annotation_event_view.py defines a plain _allele(session, digest, level) helper plus an _event(...) factory-style function. The instinct to write factory functions is already emerging ad hoc; the open question is whether to formalize it and how.
Proposed behavior
Pick one construction strategy and document it as the convention for new DB-backed test objects:
Option 1 — factory_boy (SQLAlchemyModelFactory + SubFactory for relationship chains).
- Pros: declarative; a single
create() builds the whole relationship chain with sensible defaults; override only the fields a test cares about; canonical, well-documented solution.
- Cons: new framework dependency and DSL to learn; SQLAlchemy session wiring per test has known friction; transitional period where factories coexist with cascade fixtures.
Option 2 — explicit scenario-builder functions (e.g. build_annotation_scenario(session) returning a named struct of the created pieces, called in the test's arrange block).
- Pros: explicit, readable, debuggable; no new dependency; directly extends the existing
tests/helpers/util/ pattern (e.g. the HTTP-level create_seq_score_set).
- Cons: lower ceiling; deep relationship chains are less automatic and must be wired by hand.
Whichever is chosen, the convention must state that genuine infrastructure (session, client, base setup_lib_db) stays as fixtures — only domain object construction moves.
Acceptance criteria
Implementation notes
- Non-DB mock factories already exist in
tests/helpers/mocks/factories.py; DB-backed construction is the new, missing piece. If factory_boy is chosen, keep DB factories distinct from those mock factories to avoid confusing the two layers.
- If
factory_boy is chosen, resolve session injection up front (binding the active test session to factories) since that is the main friction point; verify it composes with the per-test PostgreSQL fixture.
- This issue depends on / pairs with the narrower fixture-deduplication cleanup tracked separately — that one consolidates the existing cascade fixtures; this one decides the longer-term construction strategy. Sequence them so the dedup work isn't redone.
- Scope guard: this is a direction-setting decision plus a reference implementation, not a suite-wide rewrite.
Summary
The API test suite has no agreed way to construct DB-backed domain objects. It leans on pytest fixture cascades, which work for infrastructure but scale poorly for the domain's many related entities. This issue is a decision: adopt either
factory_boy(SQLAlchemy model factories with subfactories) or explicit scenario-builder functions, then roll it out incrementally. Fixtures stay for genuine infrastructure; this is specifically about domain object construction.Problem
DB-backed setup currently flows through cascade fixtures (
setup_lib_db→setup_lib_db_with_score_set→setup_lib_db_with_variant). The cascade:This is already visible in the recent annotation-pipeline allele refactor: two test files written together invented allele creation two different ways —
tests/models/test_annotation_event_model.pydefines anallelepytest fixture, whiletests/models/test_annotation_event_view.pydefines a plain_allele(session, digest, level)helper plus an_event(...)factory-style function. The instinct to write factory functions is already emerging ad hoc; the open question is whether to formalize it and how.Proposed behavior
Pick one construction strategy and document it as the convention for new DB-backed test objects:
Option 1 —
factory_boy(SQLAlchemyModelFactory+SubFactoryfor relationship chains).create()builds the whole relationship chain with sensible defaults; override only the fields a test cares about; canonical, well-documented solution.Option 2 — explicit scenario-builder functions (e.g.
build_annotation_scenario(session)returning a named struct of the created pieces, called in the test's arrange block).tests/helpers/util/pattern (e.g. the HTTP-levelcreate_seq_score_set).Whichever is chosen, the convention must state that genuine infrastructure (
session,client, basesetup_lib_db) stays as fixtures — only domain object construction moves.Acceptance criteria
Implementation notes
tests/helpers/mocks/factories.py; DB-backed construction is the new, missing piece. Iffactory_boyis chosen, keep DB factories distinct from those mock factories to avoid confusing the two layers.factory_boyis chosen, resolve session injection up front (binding the active testsessionto factories) since that is the main friction point; verify it composes with the per-test PostgreSQL fixture.