Skip to content

Add SearchGUI support for InstaNovo and InstaNovo+#387

Open
BioGeek wants to merge 18 commits into
CompOmics:masterfrom
BioGeek:instanovo
Open

Add SearchGUI support for InstaNovo and InstaNovo+#387
BioGeek wants to merge 18 commits into
CompOmics:masterfrom
BioGeek:instanovo

Conversation

@BioGeek

@BioGeek BioGeek commented Jun 22, 2026

Copy link
Copy Markdown

InstaNovo is a de novo peptide sequencing tool.

This PR adds SearchGUI support for InstaNovo v1.2.2 in three modes:

  • InstaNovo: de novo peptide sequencing with the transformer-based InstaNovo model.
  • InstaNovo+: de novo peptide sequencing with the diffusion-based InstaNovo+ model.
  • InstaNovo with refinement: InstaNovo predictions refined with InstaNovo+.

Changes

  • Adds GUI entries for:

    • InstaNovo
    • InstaNovo+
    • InstaNovo with refinement
  • Adds GUI descriptions and documentation links for the new algorithms.

  • Adds advanced settings for:

    • InstaNovo transformer model
    • InstaNovo+ diffusion model
    • number of beams
    • standard vs knapsack beam search
    • save all beam predictions
    • batch size
    • optional config path
    • force CPU execution
  • Adds SearchCLI flags:

    • -instanovo
    • -instanovo_plus
    • -instanovo_refine
    • -instanovo_folder
  • Allows de novo-only InstaNovo runs without a FASTA database.

  • Keeps FASTA validation for database search engine runs.

  • Adds progress parsing for InstaNovo prediction output.

  • Handles non-zero InstaNovo process exits as failed runs.

  • Improves cancellation handling for external InstaNovo processes.

  • Adds documentation for local InstaNovo setup with uv.

De Novo End-To-End With PeptideShaker

Follow-up changes so a de novo only run flows all the way through to PeptideShaker:

  • Requires a database only when a database search engine is selected; de novo only runs (including with PeptideShaker post-processing enabled) no longer demand a FASTA.
  • Omits the -fasta_file argument when launching PeptideShaker for a de novo only run.
  • Uses the .mzML extension for ThermoRawFileParser output, matching what the tool actually writes, so converted spectra are found on case-sensitive file systems (de novo engines read the spectrum file directly).
  • Reports a clear error and cancels the run when an external tool cannot be started (e.g. ThermoRawFileParser when mono is missing) instead of failing later with a null process NPE.

Implementation Notes

  • SearchGUI runs InstaNovo from a local InstaNovo checkout.
  • The installation folder should be the InstaNovo repository root.
  • SearchGUI resolves the executable from .venv/bin/instanovo, then instanovo in the selected folder, then instanovo from PATH.
  • The three output types are kept separate:
    • .instanovo.csv
    • .instanovoplus.csv
    • .instanovo.refined.csv

Screenshots

Screenshot From 2026-06-22 21-30-05 Screenshot From 2026-06-22 14-53-07

Tests

Ran:

mvn -Dtest=eu.isas.searchgui.processbuilders.InstaNovoProcessBuilderTest,eu.isas.searchgui.cmd.SearchCLIInstaNovoTest,eu.isas.searchgui.gui.SearchGUIInstaNovoModelsTest test
mvn -DskipTests -Dmaven.javadoc.skip=true package

Merge Order

Recommended order:

  1. Add InstaNovo and InstaNovo+ import support compomics-utilities#73
  2. This PR
  3. Add PeptideShaker import UI support for InstaNovo result files peptide-shaker#570
  4. Add DeNovoGUI compatibility APIs to Utilities compomics-utilities#74
  5. Add InstaNovo support to DeNovoGUI denovogui#53
  6. Update SearchGUI recipe to 4.3.17 bioconda/bioconda-recipes#66604
  7. Update PeptideShaker recipe to 3.0.13 bioconda/bioconda-recipes#66603
  8. Expose InstaNovo workflows in online input form barsnes-group/peptide-shaker-online#30
  9. Bump PeptideShaker wrappers for InstaNovo support galaxyproteomics/tools-galaxyp#828

BioGeek added 3 commits June 22, 2026 22:36
…tabase

A database is only required when a database search engine is selected; de novo only runs (including PeptideShaker post-processing) can now proceed without one. The PeptideShaker launch omits the -fasta_file argument when no database is set.
When an external tool (such as ThermoRawFileParser) cannot be started, report the cause and cancel the run instead of failing later with a null process NullPointerException.
ThermoRawFileParser writes .mzML output; request and look for that exact extension so the converted file is found on case-sensitive file systems (otherwise de novo engines reading the spectrum file directly could not find it).
@BioGeek BioGeek force-pushed the instanovo branch 2 times, most recently from 752d105 to a1d9188 Compare June 22, 2026 20:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant