Skip to content

fix(diann2msstats): keep raw sequence when pyopenms can't resolve a mod#82

Open
ypriverol wants to merge 1 commit into
devfrom
fix/diann2msstats-resilient-pyopenms
Open

fix(diann2msstats): keep raw sequence when pyopenms can't resolve a mod#82
ypriverol wants to merge 1 commit into
devfrom
fix/diann2msstats-resilient-pyopenms

Conversation

@ypriverol
Copy link
Copy Markdown
Member

Summary

diann2msstats crashes the whole conversion when DIA-NN emits a peptide with a modification that pyopenms cannot resolve. This is the bug that surfaces as:

RuntimeError: the value 'Met-loss' was used but is not valid;
Cannot convert string to peptide modification.
No modification matches in our database.

…always preceded by:

UserWarning: OPENMS_DATA_PATH environment variable not found and
no share directory was installed.

The runtime container ships pyopenms without the OpenMS share directory (UniMod XML), so only a small set of compiled-in common mods (Carbamidomethyl, Oxidation, Phospho, …) resolves. Any DIA-NN run whose SDRF declares Acetyl, Met-loss, Met-loss+Acetyl, or any less-common UniMod entry crashes outright.

This PR makes the converter resilient to that state.

Changes

  • New helper _to_openms_sequence(seq) that wraps AASequence.fromString in a try/except RuntimeError, logs a warning, and returns the input unchanged on failure. The leading ^ anchor used by DIA-NN for N-terminal-cleavage peptides is preserved on both the canonical and fallback paths.
  • Dedupe the conversion: build a map over the unique PeptideSequence values and re-map back. The old .apply(..., axis=1) re-parsed identical strings on every row — wasteful on real (multi-million-row) DIA reports.
  • Two unit tests in TestDiannCommands:
    • unknown modification falls back to the raw input
    • ^ anchor preserved on both paths

Why the raw string is safe to pass through

MSstats only needs PeptideSequence to be a stable identifier per peptidoform. The DIA-NN string is already that — different mod forms produce different strings — so downstream conversion is unaffected. Only the cosmetic "canonical form" is lost for the unresolvable rows.

Test plan

pytest tests/test_commands.py::TestDiannCommands -v

All three relevant tests pass locally:

test_to_openms_sequence_falls_back_on_unknown_mod  PASSED
test_to_openms_sequence_preserves_n_term_anchor    PASSED
test_diann2msstats_example                         PASSED   (regression check)

Related upstream fix needed

The deeper fix is the bioconda recipe: quantms-utils should ship pyopenms with the OpenMS share directory (or set OPENMS_DATA_PATH) so the parser has the full UniMod database. Filing a separate issue against this repo to track the container packaging side; this source PR is the resilient companion that prevents future surprise mods from crashing the converter even after the recipe is fixed.

Reproduction context

Hit in the wild during a reanalysis of PXD004701 (Sun et al. 2023, MCP) whose SDRF correctly declared N-term Met-loss (UniMod:765). Same failure mode previously surfaced with Acetyl (UniMod:1) on other datasets. Repro details in the companion issue.

`diann2msstats` crashes the conversion when DIA-NN emits a peptide with
a modification that pyopenms cannot resolve, e.g.:

  RuntimeError: the value 'Met-loss' was used but is not valid;
  Cannot convert string to peptide modification.
  No modification matches in our database.

This fires in production whenever the runtime container ships pyopenms
without the OpenMS share directory (UniMod XML) — visible via the
warning that always precedes the traceback:

  UserWarning: OPENMS_DATA_PATH environment variable not found and
  no share directory was installed.

In that state only a small set of common mods (Carbamidomethyl,
Oxidation, Phospho, …) resolves from the compiled-in fallback, so
Acetyl, Met-loss, and other valid UniMod entries crash the whole job.
Any reanalysis whose SDRF declares such a mod is bricked.

Fix the source side: wrap the `AASequence.fromString` round-trip in
`_to_openms_sequence`, catch `RuntimeError`, log a warning, and pass
the raw DIA-NN string through unchanged. The raw string is still a
stable peptide identifier for MSstats (different mod forms => distinct
IDs), so downstream consumers are unaffected.

Also dedupe the conversion across rows: build a map over the unique
PeptideSequence values and re-map back. The old `.apply(... axis=1)`
re-parsed identical strings millions of times on real datasets.

Adds two unit tests against `_to_openms_sequence`:
  * unknown mod falls back to the raw input
  * `^` N-term anchor is preserved on both code paths

Note: the deeper fix is the container/recipe packaging — the bioconda
recipe for quantms-utils should ship pyopenms with its OpenMS share
directory so the parser has the full UniMod database. This source
patch is the resilient companion: even after the recipe is fixed, an
unexpected mod string should not crash the converter.
@qodo-code-review
Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 19, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bcfe6b1b-c89f-452a-998d-17ec7b4deeff

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/diann2msstats-resilient-pyopenms

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codacy-production
Copy link
Copy Markdown

Not up to standards ⛔

🔴 Issues 1 minor

Alerts:
⚠ 1 issue (≤ 0 issues of at least minor severity)

Results:
1 new issue

Category Results
Documentation 1 minor

View in Codacy

🟢 Metrics 4 complexity

Metric Results
Complexity 4

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants