Skip to content

Add InstaNovo and InstaNovo+ import support#73

Open
BioGeek wants to merge 10 commits into
CompOmics:masterfrom
BioGeek:instanovo
Open

Add InstaNovo and InstaNovo+ import support#73
BioGeek wants to merge 10 commits into
CompOmics:masterfrom
BioGeek:instanovo

Conversation

@BioGeek

@BioGeek BioGeek commented Jun 22, 2026

Copy link
Copy Markdown

InstaNovo is a de novo peptide sequencing tool.

This PR adds support for normalized CSV output from InstaNovo v1.2.2 in three modes:

  • InstaNovo: de novo peptide sequencing with the transformer-based InstaNovo model.
  • InstaNovo+: de novo peptide sequencing with the diffusion-based InstaNovo+ model.
  • InstaNovo with refinement: InstaNovo predictions refined with InstaNovo+.

This PR forms the basis for a bunch of related PRs within the CompOmics ecosystem.

Changes

  • Adds built-in advocates for:
    • InstaNovo
    • InstaNovo+
    • InstaNovo with refinement
  • Adds identification-file readers for:
    • .instanovo.csv
    • .instanovoplus.csv
    • .instanovo.refined.csv
  • Adds support for normalized InstaNovo v1.2.2 CSV columns.
  • Adds support for the default UniMod annotations emitted by InstaNovo's default residue configuration.
  • Adds sample-row based tests for InstaNovo, InstaNovo+, and refined output.
  • Improves spectrum-title matching for real MGF/mzML titles.
  • Handles whitespace-padded precursor charges and skips rows with invalid charges.
  • Adds shared InstaNovo parameter classes used by SearchGUI advanced settings.

Supporting Fixes

Shared fixes needed for the InstaNovo de novo end-to-end flow (SearchGUI → PeptideShaker):

  • Declares the ThermoRawFileParser mzML output ending as .mzML (the canonical extension the tool writes) so converted spectra are found on case-sensitive file systems.
  • Aligns log4j-api to 2.25.4 to match the bumped log4j-core, fixing a runtime NoSuchFieldError.
  • Suppresses (logs but does not pop a dialog for) the benign Nimbus look-and-feel ClassCastException (ColorUIResource cannot be cast to Boolean) thrown while building chart popup menus on recent JDKs.

Implementation Notes

  • The refined workflow is represented separately from standalone InstaNovo+ so downstream tools can distinguish both result types.
  • Refined output reports software metadata for the refined workflow and the underlying InstaNovo/InstaNovo+ tools.
  • Spectrum title lookup is cached per spectrum file during CSV import.

Tests

Ran:

mvn -Dtest=com.compomics.util.test.experiment.io.identifications.TestInstaNovoIdfileReader,com.compomics.util.test.parameters.identification.tool_specific.TestInstaNovoParameters test
mvn -DskipTests -Dmaven.javadoc.skip=true install

Covered by tests:

  • InstaNovo v1.2.2 CSV parsing.
  • InstaNovo+ v1.2.2 CSV parsing.
  • InstaNovo with refinement v1.2.2 CSV parsing.
  • default InstaNovo UniMod modification mapping.
  • realistic non-numeric spectrum titles with scan= tokens.
  • invalid and whitespace-padded precursor charges.

Merge Order

Recommended order:

  1. This PR
  2. Add SearchGUI support for InstaNovo and InstaNovo+ searchgui#387
  3. Add PeptideShaker import UI support for InstaNovo result files peptide-shaker#570
  4. Add DeNovoGUI compatibility APIs to Utilities #74
  5. Add InstaNovo support to DeNovoGUI denovogui#53
  6. Update SearchGUI recipe to 4.3.17 bioconda/bioconda-recipes#66604
  7. Update PeptideShaker recipe to 3.0.13 bioconda/bioconda-recipes#66603
  8. Expose InstaNovo workflows in online input form barsnes-group/peptide-shaker-online#30
  9. Bump PeptideShaker wrappers for InstaNovo support galaxyproteomics/tools-galaxyp#828

BioGeek added 3 commits June 22, 2026 22:36
On recent JDKs the Nimbus look and feel can throw a benign ClassCastException (ColorUIResource cannot be cast to Boolean in NimbusStyle.isOpaque) while building chart popup menus. The exception is still logged but no longer shown to the user, as it does not affect functionality.
ThermoRawFileParser writes mzML files with the canonical .mzML extension. Declaring the format ending as .mzML lets consumers find the converted file on case-sensitive file systems.
log4j-core was bumped to 2.25.4 while log4j-api stayed at 2.23.1, causing a NoSuchFieldError at runtime. Bump log4j-api to 2.25.4 to match.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant