Complete Rewrite of varsome-api-client-python#29
Conversation
BREAKING CHANGE: models/elements/* consolidated into models/annotation.py; CLI moved from scripts/ to varsome_api/cli/ package; client and VCF interfaces rewritten. - Migrate build tooling from setup.py/tox.ini to Poetry - Rewrite client.py with improved async/sync support via _sync.py - Rewrite vcf.py with expanded VCF annotation handling - Consolidate all fragmented models/elements/* into models/annotation.py - Add models/slim/ for lightweight annotation models - Restructure CLI into varsome_api/cli/ package (varsome_api_run, varsome_api_annotate_vcf, utils) - Expand exception hierarchy in exceptions.py - Add GitHub Actions workflows for tests and Docker release - Add Dockerfile, .dockerignore, and Docker documentation - Add parquet annotation example under examples/parquet/ - Expand test suite: test_cli, test_models, test_vcf, test_sync, test_exception with fixtures - Rewrite README and add docs/developer-guide.md, docs/docker.md - Add .editorconfig, .flake8, updated .pre-commit-config.yaml
16a6afb to
faa321c
Compare
Python Test Results216 tests 209 ✅ 2s ⏱️ For more details on these failures, see this check. Results for commit faa321c. |
Python Test Results216 tests 216 ✅ 1s ⏱️ Results for commit 6dbad8a. ♻️ This comment has been updated with latest results. |
6dbad8a to
15054e9
Compare
Python Test Results (3.13)216 tests 216 ✅ 0s ⏱️ Results for commit 91fa98f. ♻️ This comment has been updated with latest results. |
Python Test Results (3.11)216 tests 216 ✅ 2s ⏱️ Results for commit 91fa98f. ♻️ This comment has been updated with latest results. |
Python Test Results (3.14)216 tests 216 ✅ 1s ⏱️ Results for commit 91fa98f. ♻️ This comment has been updated with latest results. |
Python Test Results (3.12)216 tests 216 ✅ 2s ⏱️ Results for commit 91fa98f. ♻️ This comment has been updated with latest results. |
15054e9 to
6b1b6a1
Compare
… types Refactor the VarSomeAPIClient to handle queries for variants, genes, and CNVs. Additionally, the CLI tool and developer guide have been updated to reflect these changes
There was a problem hiding this comment.
Pull request overview
This PR is a full modernization/rewrite of the varsome_api Python client to support newer Python versions (3.11–3.14), migrate packaging/tooling to Poetry + GitHub Actions + Docker, and redesign the client/VCF/CLI layers around async-first I/O with sync wrappers.
Changes:
- Rewrote the HTTP client to use
aiohttp, added async batch streaming with a sync wrapper (_sync.py). - Replaced legacy
jsonmodelselement models with Pydantic v2 models (generatedmodels/annotation.py+ slim models for performance). - Rebuilt VCF annotation to use
pysam(optional[vcf]extra) and moved CLI tools intovarsome_api/cli/with Poetry entry points.
Reviewed changes
Copilot reviewed 67 out of 75 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| version.py | Removed legacy version constant file. |
| varsome_api/vcf.py | Rewritten VCF annotator built on pysam + async batching. |
| varsome_api/models/variant.py | Replaced old jsonmodels variant with Pydantic-based model + convenience mixin. |
| varsome_api/models/slim/annotation.py | Added slim Pydantic model for VCF pipelines. |
| varsome_api/models/slim/init.py | Exported slim model symbols. |
| varsome_api/models/fields.py | Removed jsonmodels custom fields. |
| varsome_api/models/elements/wustl.py | Removed legacy jsonmodels element definitions. |
| varsome_api/models/elements/uniprot.py | Removed legacy jsonmodels element definitions. |
| varsome_api/models/elements/transcript.py | Removed legacy jsonmodels element definitions. |
| varsome_api/models/elements/thousand_genomes.py | Removed legacy jsonmodels element definitions. |
| varsome_api/models/elements/sanger.py | Removed legacy jsonmodels element definitions. |
| varsome_api/models/elements/ncbi.py | Removed legacy jsonmodels element definitions. |
| varsome_api/models/elements/isb.py | Removed legacy jsonmodels element definitions. |
| varsome_api/models/elements/icgc.py | Removed legacy jsonmodels element definitions. |
| varsome_api/models/elements/iarc.py | Removed legacy jsonmodels element definitions. |
| varsome_api/models/elements/gwas.py | Removed legacy jsonmodels element definitions. |
| varsome_api/models/elements/gnomad.py | Removed legacy jsonmodels element definitions. |
| varsome_api/models/elements/gerp.py | Removed legacy jsonmodels element definitions. |
| varsome_api/models/elements/dbnsfp.py | Removed legacy jsonmodels element definitions. |
| varsome_api/models/elements/dann.py | Removed legacy jsonmodels element definitions. |
| varsome_api/models/elements/broad.py | Removed legacy jsonmodels element definitions. |
| varsome_api/models/elements/acmg.py | Removed legacy jsonmodels element definitions. |
| varsome_api/models/elements/init.py | Removed legacy wildcard element exports. |
| varsome_api/models/init.py | Added explicit exports for new models. |
| varsome_api/log.py | Introduced library logger with NullHandler. |
| varsome_api/exceptions.py | Added VarSomeAPIException with HTTP status mapping. |
| varsome_api/constants.py | Centralized literal types/defaults for ref genome + query types. |
| varsome_api/client.py | Rewritten API client using aiohttp, async batch generator, and sync wrappers. |
| varsome_api/cli/varsome_api_run.py | New CLI entry point for lookup (variants/genes/CNVs). |
| varsome_api/cli/varsome_api_annotate_vcf.py | New CLI entry point for VCF annotation. |
| varsome_api/cli/utils.py | Shared CLI parsing/validation/output helpers. |
| varsome_api/cli/init.py | Added CLI package marker. |
| varsome_api/_sync.py | Added run_sync + sync_wrapper utilities. |
| varsome_api/init.py | Added package __version__. |
| tox.ini | Removed legacy tox config. |
| tests/test_vcf.py | Added VCF annotator test suite. |
| tests/test_sync.py | Added tests for sync-wrapper utilities. |
| tests/test_models.py | Added tests for full vs slim model deserialization + mixin properties. |
| tests/test_exception.py | Added tests for VarSomeAPIException. |
| tests/test_cli.py | Added CLI utilities + parser validation tests. |
| tests/fixtures/variants.vcf | Added VCF fixture for tests. |
| tests/conftest.py | Added shared fixtures/mocks for request stubbing. |
| setup.py | Removed legacy setuptools packaging. |
| scripts/varsome_api_run.py | Removed legacy scripts-based CLI. |
| scripts/varsome_api_annotate_vcf.py | Removed legacy scripts-based CLI. |
| pyproject.toml | Migrated to Poetry packaging, deps, extras, pytest + coverage config. |
| MANIFEST.in | Removed legacy manifest. |
| examples/README.md | Added examples index doc. |
| examples/parquet/writer.py | Added Parquet writer example support code. |
| examples/parquet/requirements.txt | Added example-local requirements. |
| examples/parquet/models.py | Added Parquet schema + row conversion. |
| examples/parquet/annotate_to_parquet.py | Added runnable CSV→Parquet annotation example. |
| docs/docker.md | Added Docker usage guide. |
| docs/developer-guide.md | Added developer guide for client/VCF/models usage. |
| Dockerfile | Added multi-stage Docker build with [vcf] extra. |
| CHANGELOG.md | Updated changelog for v1 beta releases. |
| .pre-commit-config.yaml | Updated pre-commit hooks; added Poetry lock/export hooks. |
| .gitignore | Replaced with expanded Python gitignore template. |
| .github/workflows/python-tests.yml | Added CI matrix for Python 3.11–3.14 with Poetry installs. |
| .github/workflows/docker-release.yml | Added GHCR multi-arch docker publish on release. |
| .flake8 | Added flake8 config. |
| .editorconfig | Added EditorConfig defaults. |
| .dockerignore | Added docker ignore rules (excludes tests/docs/examples/etc.). |
| .cz.toml | Updated commitizen versioning to new version sources. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…nt.py Updated the error handling logic to check for response status codes greater than or equal to 400
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 67 out of 75 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| path = kwargs.get("path", "<unknown path>") | ||
| method = kwargs.get("method", "GET") | ||
| json_body = kwargs.get("json") | ||
| if method == "POST" and isinstance(json_body, dict): | ||
| variant_count = len(json_body.get("variants", [])) | ||
| else: | ||
| variant_count = 1 |
There was a problem hiding this comment.
_log_request_time only counts batch size from the "variants" JSON key. For gene batches (payload key "genes") this will log 0 variant(s) and make request timing logs misleading. Consider detecting the batch list generically (e.g., first list value in the JSON body, or handle both variants/genes).
Previously, the request timing logger only counted the number of items in the "variants" key for POST requests, which missed batch requests using the "genes" key (e.g., gene annotation lookups). Now, the logger checks for either "variants" or "genes" in the JSON payload and logs the count accordingly. No functional change to request handling; only logging is affected.
…arify -p usage Replace all usage of `add-all-data=1` with `add-ACMG-annotation=1` in README and Docker documentation to reflect current API defaults. Clarify that for VCF annotation, using `-p` with parameters other than the default may not work as expected, and recommend sticking to the default or consulting documentation for custom pipelines. Add explicit examples for combining multiple `-p` parameters for single variant annotation. Remove `-p` from VCF annotation Docker examples where not required
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 67 out of 75 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (14)
varsome_api/client.py:1
aiohttp.ClientSessionis being constructed withself.api_urlas a positional argument. In aiohttp, the base URL is expected via thebase_url=keyword; passing a string positionally can be interpreted as a different parameter (or fail depending on aiohttp version), breaking all requests. Useaiohttp.ClientSession(base_url=self.api_url, headers=self._headers)in both places.
varsome_api/client.py:1aiohttp.ClientSessionis being constructed withself.api_urlas a positional argument. In aiohttp, the base URL is expected via thebase_url=keyword; passing a string positionally can be interpreted as a different parameter (or fail depending on aiohttp version), breaking all requests. Useaiohttp.ClientSession(base_url=self.api_url, headers=self._headers)in both places.
varsome_api/client.py:1isinstance(items, AsyncIterable)will raiseTypeErrorat runtime whenAsyncIterablecomes fromtyping(typing generics are not valid inisinstancechecks). ImportAsyncIterablefromcollections.abc(or check for__aiter__) to make this safe.
varsome_api/client.py:1isinstance(items, AsyncIterable)will raiseTypeErrorat runtime whenAsyncIterablecomes fromtyping(typing generics are not valid inisinstancechecks). ImportAsyncIterablefromcollections.abc(or check for__aiter__) to make this safe.
varsome_api/client.py:1- The client advertises
zstdinAccept-Encoding, but aiohttp does not natively decode zstd content encoding in many configurations. If the server selects zstd, responses may fail to decode. Consider removingzstdunless you have guaranteed decoding support.
varsome_api/client.py:1 _make_requestis annotated to returndict[str, Any], but batch endpoints return a JSON list (and callers already treat it aslist[dict]). Also, error handling assumes the body is always JSON (await response.json()), which can throwaiohttp.ContentTypeError/ JSON decode errors (e.g., empty body, text/html, text/plain). Update the return type toAny(ordict[str, Any] | list[Any]) and add a robust fallback for error/success parsing (try JSON, else readresponse.text()).
varsome_api/client.py:1_make_requestis annotated to returndict[str, Any], but batch endpoints return a JSON list (and callers already treat it aslist[dict]). Also, error handling assumes the body is always JSON (await response.json()), which can throwaiohttp.ContentTypeError/ JSON decode errors (e.g., empty body, text/html, text/plain). Update the return type toAny(ordict[str, Any] | list[Any]) and add a robust fallback for error/success parsing (try JSON, else readresponse.text()).
varsome_api/client.py:1_make_requestis annotated to returndict[str, Any], but batch endpoints return a JSON list (and callers already treat it aslist[dict]). Also, error handling assumes the body is always JSON (await response.json()), which can throwaiohttp.ContentTypeError/ JSON decode errors (e.g., empty body, text/html, text/plain). Update the return type toAny(ordict[str, Any] | list[Any]) and add a robust fallback for error/success parsing (try JSON, else readresponse.text()).
varsome_api/client.py:1- The log message contains a grammatical typo:
"genes(s)"should be"gene(s)"(or rephrase to avoid the awkward pluralization).
varsome_api/vcf.py:1 variant_result.original_variantis nullable on the slim model, butannotate_recordexpects astr. PassingNonehere can lead to INFO fields likeoriginal_variantbeing written asNone(or raising when writing). Use the request query string (variant) as the fallback:variant_result.original_variant or variant.
varsome_api/vcf.py:1- This unconditionally overwrites
record.pos/record.alleleseven when the API returnsNoneforpos/ref/alt, despite the docstring stating these are only updated when non-None. Concretely: settingrecord.pos = Nonecan raise, and inserting"."into alleles can produce invalid alleles in output. Only updatepos/alleleswhen the corresponding API fields are notNone, otherwise keep the original record values.
varsome_api/models/variant.py:1 list(set(...))returns genes in a non-deterministic order, which can create noisy diffs and unstable output (e.g., VCF INFO ordering). Prefer a deterministic approach (e.g., preserve first-seen order viadict.fromkeys(...), orsorted(set(...))if ordering is not important).
varsome_api/exceptions.py:1- When
statusisNone, the string becomesNone (Connection refused)/None (Unknown error.), which is awkward for end users and logs. Consider omitting the status prefix entirely whenstatus is None(e.g., return just the detail), while keeping the current format for HTTP status errors.
varsome_api/cli/utils.py:1 configure_logging()adds a newStreamHandlerevery time it’s called, which can duplicate log lines in repeated invocations (tests, multiple CLI entry points, or library consumers). Guard against duplicate handlers (e.g., only add if no similar handler exists, or clear existing handlers first).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
What Changed
models/elements/*→ unifiedmodels/annotation.py_sync.pyscripts/→varsome_api/cli/package (3 entry points)Breaking Changes⚠️
models.elements.*→models.annotationvarsome_api_run,varsome_api_annotate_vcfinstead of their .py equivalents in the previous versionsRelated to #28