Skip to content

Complete Rewrite of varsome-api-client-python#29

Draft
ckopanos wants to merge 8 commits intomasterfrom
refactor/rewrite-v1
Draft

Complete Rewrite of varsome-api-client-python#29
ckopanos wants to merge 8 commits intomasterfrom
refactor/rewrite-v1

Conversation

@ckopanos
Copy link
Copy Markdown
Member

@ckopanos ckopanos commented Mar 29, 2026

What Changed

  • Python Compatibility: Drop Python <3.11, add support for 3.11, 3.12, 3.13, 3.14
  • Build: Poetry, GitHub Actions, Docker, pre-commit (modernize tooling)
  • Models: Consolidated models/elements/* → unified models/annotation.py
  • Client: Rewritten with better async/sync support via _sync.py
  • VCF: Completely rewritten — previous library incompatible with Python >3.10
  • CLI: Moved from scripts/varsome_api/cli/ package (3 entry points)
  • Tests: Expanded to cover for the client functionality
  • Docs: New developer guide, Docker guide, parquet example

Breaking Changes ⚠️

  • Minimum Python version raised to 3.11
  • All imports from models.elements.*models.annotation
  • CLI commands now use entry points: varsome_api_run, varsome_api_annotate_vcf instead of their .py equivalents in the previous versions
  • Client and VCF APIs redesigned

Related to #28

BREAKING CHANGE: models/elements/* consolidated into models/annotation.py;
CLI moved from scripts/ to varsome_api/cli/ package; client and VCF
interfaces rewritten.

- Migrate build tooling from setup.py/tox.ini to Poetry
- Rewrite client.py with improved async/sync support via _sync.py
- Rewrite vcf.py with expanded VCF annotation handling
- Consolidate all fragmented models/elements/* into models/annotation.py
- Add models/slim/ for lightweight annotation models
- Restructure CLI into varsome_api/cli/ package (varsome_api_run,
  varsome_api_annotate_vcf, utils)
- Expand exception hierarchy in exceptions.py
- Add GitHub Actions workflows for tests and Docker release
- Add Dockerfile, .dockerignore, and Docker documentation
- Add parquet annotation example under examples/parquet/
- Expand test suite: test_cli, test_models, test_vcf,
  test_sync, test_exception with fixtures
- Rewrite README and add docs/developer-guide.md, docs/docker.md
- Add .editorconfig, .flake8, updated .pre-commit-config.yaml
@ckopanos ckopanos force-pushed the refactor/rewrite-v1 branch 2 times, most recently from 16a6afb to faa321c Compare March 29, 2026 10:17
@github-actions
Copy link
Copy Markdown

Python Test Results

216 tests   209 ✅  2s ⏱️
  1 suites    0 💤
  1 files      7 ❌

For more details on these failures, see this check.

Results for commit faa321c.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 29, 2026

Python Test Results

216 tests   216 ✅  1s ⏱️
  1 suites    0 💤
  1 files      0 ❌

Results for commit 6dbad8a.

♻️ This comment has been updated with latest results.

@ckopanos ckopanos force-pushed the refactor/rewrite-v1 branch 4 times, most recently from 6dbad8a to 15054e9 Compare March 29, 2026 10:53
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 29, 2026

Python Test Results (3.13)

216 tests   216 ✅  0s ⏱️
  1 suites    0 💤
  1 files      0 ❌

Results for commit 91fa98f.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 29, 2026

Python Test Results (3.11)

216 tests   216 ✅  2s ⏱️
  1 suites    0 💤
  1 files      0 ❌

Results for commit 91fa98f.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 29, 2026

Python Test Results (3.14)

216 tests   216 ✅  1s ⏱️
  1 suites    0 💤
  1 files      0 ❌

Results for commit 91fa98f.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 29, 2026

Python Test Results (3.12)

216 tests   216 ✅  2s ⏱️
  1 suites    0 💤
  1 files      0 ❌

Results for commit 91fa98f.

♻️ This comment has been updated with latest results.

@ckopanos ckopanos force-pushed the refactor/rewrite-v1 branch from 15054e9 to 6b1b6a1 Compare March 29, 2026 11:09
Comment thread tests/test_cli.py Fixed
Comment thread tests/test_cli.py Fixed
Comment thread tests/test_cli.py Fixed
Comment thread varsome_api/vcf.py Fixed
Comment thread varsome_api/client.py Fixed
@ckopanos ckopanos linked an issue Mar 29, 2026 that may be closed by this pull request
ckopanos added 2 commits April 7, 2026 15:20
… types

Refactor the VarSomeAPIClient to handle queries for variants, genes, and
CNVs.
Additionally, the CLI tool and developer guide have been updated to reflect
these changes
Copilot AI review requested due to automatic review settings April 7, 2026 12:22
Comment thread varsome_api/client.py Fixed
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR is a full modernization/rewrite of the varsome_api Python client to support newer Python versions (3.11–3.14), migrate packaging/tooling to Poetry + GitHub Actions + Docker, and redesign the client/VCF/CLI layers around async-first I/O with sync wrappers.

Changes:

  • Rewrote the HTTP client to use aiohttp, added async batch streaming with a sync wrapper (_sync.py).
  • Replaced legacy jsonmodels element models with Pydantic v2 models (generated models/annotation.py + slim models for performance).
  • Rebuilt VCF annotation to use pysam (optional [vcf] extra) and moved CLI tools into varsome_api/cli/ with Poetry entry points.

Reviewed changes

Copilot reviewed 67 out of 75 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
version.py Removed legacy version constant file.
varsome_api/vcf.py Rewritten VCF annotator built on pysam + async batching.
varsome_api/models/variant.py Replaced old jsonmodels variant with Pydantic-based model + convenience mixin.
varsome_api/models/slim/annotation.py Added slim Pydantic model for VCF pipelines.
varsome_api/models/slim/init.py Exported slim model symbols.
varsome_api/models/fields.py Removed jsonmodels custom fields.
varsome_api/models/elements/wustl.py Removed legacy jsonmodels element definitions.
varsome_api/models/elements/uniprot.py Removed legacy jsonmodels element definitions.
varsome_api/models/elements/transcript.py Removed legacy jsonmodels element definitions.
varsome_api/models/elements/thousand_genomes.py Removed legacy jsonmodels element definitions.
varsome_api/models/elements/sanger.py Removed legacy jsonmodels element definitions.
varsome_api/models/elements/ncbi.py Removed legacy jsonmodels element definitions.
varsome_api/models/elements/isb.py Removed legacy jsonmodels element definitions.
varsome_api/models/elements/icgc.py Removed legacy jsonmodels element definitions.
varsome_api/models/elements/iarc.py Removed legacy jsonmodels element definitions.
varsome_api/models/elements/gwas.py Removed legacy jsonmodels element definitions.
varsome_api/models/elements/gnomad.py Removed legacy jsonmodels element definitions.
varsome_api/models/elements/gerp.py Removed legacy jsonmodels element definitions.
varsome_api/models/elements/dbnsfp.py Removed legacy jsonmodels element definitions.
varsome_api/models/elements/dann.py Removed legacy jsonmodels element definitions.
varsome_api/models/elements/broad.py Removed legacy jsonmodels element definitions.
varsome_api/models/elements/acmg.py Removed legacy jsonmodels element definitions.
varsome_api/models/elements/init.py Removed legacy wildcard element exports.
varsome_api/models/init.py Added explicit exports for new models.
varsome_api/log.py Introduced library logger with NullHandler.
varsome_api/exceptions.py Added VarSomeAPIException with HTTP status mapping.
varsome_api/constants.py Centralized literal types/defaults for ref genome + query types.
varsome_api/client.py Rewritten API client using aiohttp, async batch generator, and sync wrappers.
varsome_api/cli/varsome_api_run.py New CLI entry point for lookup (variants/genes/CNVs).
varsome_api/cli/varsome_api_annotate_vcf.py New CLI entry point for VCF annotation.
varsome_api/cli/utils.py Shared CLI parsing/validation/output helpers.
varsome_api/cli/init.py Added CLI package marker.
varsome_api/_sync.py Added run_sync + sync_wrapper utilities.
varsome_api/init.py Added package __version__.
tox.ini Removed legacy tox config.
tests/test_vcf.py Added VCF annotator test suite.
tests/test_sync.py Added tests for sync-wrapper utilities.
tests/test_models.py Added tests for full vs slim model deserialization + mixin properties.
tests/test_exception.py Added tests for VarSomeAPIException.
tests/test_cli.py Added CLI utilities + parser validation tests.
tests/fixtures/variants.vcf Added VCF fixture for tests.
tests/conftest.py Added shared fixtures/mocks for request stubbing.
setup.py Removed legacy setuptools packaging.
scripts/varsome_api_run.py Removed legacy scripts-based CLI.
scripts/varsome_api_annotate_vcf.py Removed legacy scripts-based CLI.
pyproject.toml Migrated to Poetry packaging, deps, extras, pytest + coverage config.
MANIFEST.in Removed legacy manifest.
examples/README.md Added examples index doc.
examples/parquet/writer.py Added Parquet writer example support code.
examples/parquet/requirements.txt Added example-local requirements.
examples/parquet/models.py Added Parquet schema + row conversion.
examples/parquet/annotate_to_parquet.py Added runnable CSV→Parquet annotation example.
docs/docker.md Added Docker usage guide.
docs/developer-guide.md Added developer guide for client/VCF/models usage.
Dockerfile Added multi-stage Docker build with [vcf] extra.
CHANGELOG.md Updated changelog for v1 beta releases.
.pre-commit-config.yaml Updated pre-commit hooks; added Poetry lock/export hooks.
.gitignore Replaced with expanded Python gitignore template.
.github/workflows/python-tests.yml Added CI matrix for Python 3.11–3.14 with Poetry installs.
.github/workflows/docker-release.yml Added GHCR multi-arch docker publish on release.
.flake8 Added flake8 config.
.editorconfig Added EditorConfig defaults.
.dockerignore Added docker ignore rules (excludes tests/docs/examples/etc.).
.cz.toml Updated commitizen versioning to new version sources.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread varsome_api/client.py
Comment thread varsome_api/client.py
Comment thread varsome_api/vcf.py
Comment thread varsome_api/vcf.py
Comment thread varsome_api/vcf.py
Comment thread varsome_api/cli/utils.py
Comment thread .editorconfig Outdated
Comment thread varsome_api/client.py Fixed
…nt.py

Updated the error handling logic to check for response status codes
greater than or equal to 400
Copilot AI review requested due to automatic review settings April 7, 2026 14:23
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 67 out of 75 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread varsome_api/client.py Outdated
Comment on lines +69 to +75
path = kwargs.get("path", "<unknown path>")
method = kwargs.get("method", "GET")
json_body = kwargs.get("json")
if method == "POST" and isinstance(json_body, dict):
variant_count = len(json_body.get("variants", []))
else:
variant_count = 1
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_log_request_time only counts batch size from the "variants" JSON key. For gene batches (payload key "genes") this will log 0 variant(s) and make request timing logs misleading. Consider detecting the batch list generically (e.g., first list value in the JSON body, or handle both variants/genes).

Copilot uses AI. Check for mistakes.
Previously, the request timing logger only counted the number of items in
the "variants" key for POST requests, which missed batch requests using
the "genes" key (e.g., gene annotation lookups). Now, the logger checks
for either "variants" or "genes" in the JSON payload and logs the count
accordingly.

No functional change to request handling; only logging is affected.
…arify -p usage

Replace all usage of `add-all-data=1` with `add-ACMG-annotation=1` in README and
Docker documentation to reflect current API defaults.

Clarify that for VCF annotation, using `-p` with parameters other than the default
may not work as expected, and recommend sticking to the default or consulting
documentation for custom pipelines. Add explicit examples for combining multiple
`-p` parameters for single variant annotation.

Remove `-p` from VCF annotation Docker examples where not required
Copilot AI review requested due to automatic review settings April 30, 2026 19:43
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 67 out of 75 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (14)

varsome_api/client.py:1

  • aiohttp.ClientSession is being constructed with self.api_url as a positional argument. In aiohttp, the base URL is expected via the base_url= keyword; passing a string positionally can be interpreted as a different parameter (or fail depending on aiohttp version), breaking all requests. Use aiohttp.ClientSession(base_url=self.api_url, headers=self._headers) in both places.
    varsome_api/client.py:1
  • aiohttp.ClientSession is being constructed with self.api_url as a positional argument. In aiohttp, the base URL is expected via the base_url= keyword; passing a string positionally can be interpreted as a different parameter (or fail depending on aiohttp version), breaking all requests. Use aiohttp.ClientSession(base_url=self.api_url, headers=self._headers) in both places.
    varsome_api/client.py:1
  • isinstance(items, AsyncIterable) will raise TypeError at runtime when AsyncIterable comes from typing (typing generics are not valid in isinstance checks). Import AsyncIterable from collections.abc (or check for __aiter__) to make this safe.
    varsome_api/client.py:1
  • isinstance(items, AsyncIterable) will raise TypeError at runtime when AsyncIterable comes from typing (typing generics are not valid in isinstance checks). Import AsyncIterable from collections.abc (or check for __aiter__) to make this safe.
    varsome_api/client.py:1
  • The client advertises zstd in Accept-Encoding, but aiohttp does not natively decode zstd content encoding in many configurations. If the server selects zstd, responses may fail to decode. Consider removing zstd unless you have guaranteed decoding support.
    varsome_api/client.py:1
  • _make_request is annotated to return dict[str, Any], but batch endpoints return a JSON list (and callers already treat it as list[dict]). Also, error handling assumes the body is always JSON (await response.json()), which can throw aiohttp.ContentTypeError / JSON decode errors (e.g., empty body, text/html, text/plain). Update the return type to Any (or dict[str, Any] | list[Any]) and add a robust fallback for error/success parsing (try JSON, else read response.text()).
    varsome_api/client.py:1
  • _make_request is annotated to return dict[str, Any], but batch endpoints return a JSON list (and callers already treat it as list[dict]). Also, error handling assumes the body is always JSON (await response.json()), which can throw aiohttp.ContentTypeError / JSON decode errors (e.g., empty body, text/html, text/plain). Update the return type to Any (or dict[str, Any] | list[Any]) and add a robust fallback for error/success parsing (try JSON, else read response.text()).
    varsome_api/client.py:1
  • _make_request is annotated to return dict[str, Any], but batch endpoints return a JSON list (and callers already treat it as list[dict]). Also, error handling assumes the body is always JSON (await response.json()), which can throw aiohttp.ContentTypeError / JSON decode errors (e.g., empty body, text/html, text/plain). Update the return type to Any (or dict[str, Any] | list[Any]) and add a robust fallback for error/success parsing (try JSON, else read response.text()).
    varsome_api/client.py:1
  • The log message contains a grammatical typo: "genes(s)" should be "gene(s)" (or rephrase to avoid the awkward pluralization).
    varsome_api/vcf.py:1
  • variant_result.original_variant is nullable on the slim model, but annotate_record expects a str. Passing None here can lead to INFO fields like original_variant being written as None (or raising when writing). Use the request query string (variant) as the fallback: variant_result.original_variant or variant.
    varsome_api/vcf.py:1
  • This unconditionally overwrites record.pos/record.alleles even when the API returns None for pos/ref/alt, despite the docstring stating these are only updated when non-None. Concretely: setting record.pos = None can raise, and inserting "." into alleles can produce invalid alleles in output. Only update pos/alleles when the corresponding API fields are not None, otherwise keep the original record values.
    varsome_api/models/variant.py:1
  • list(set(...)) returns genes in a non-deterministic order, which can create noisy diffs and unstable output (e.g., VCF INFO ordering). Prefer a deterministic approach (e.g., preserve first-seen order via dict.fromkeys(...), or sorted(set(...)) if ordering is not important).
    varsome_api/exceptions.py:1
  • When status is None, the string becomes None (Connection refused) / None (Unknown error.), which is awkward for end users and logs. Consider omitting the status prefix entirely when status is None (e.g., return just the detail), while keeping the current format for HTTP status errors.
    varsome_api/cli/utils.py:1
  • configure_logging() adds a new StreamHandler every time it’s called, which can duplicate log lines in repeated invocations (tests, multiple CLI entry points, or library consumers). Guard against duplicate handlers (e.g., only add if no similar handler exists, or clear existing handlers first).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Newer python versions not supported

2 participants