Skip to content

[INTERNAL] OpenSwath diaPASEF Speedup #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 99 commits into
base: develop
Choose a base branch
from
Draft

Conversation

jcharkow
Copy link
Owner

@jcharkow jcharkow commented Nov 9, 2022

Description

3-4 fold speedup of OpenSwathWorkflow.

Background: Filtering spectra by their ion mobility is very computationally expensive. Currently DIA scores are "blind" to the ion mobility dimension and thus require spectra to be filtered by ion mobility. Since the spectra are not sorted by ion mobility (rather by m/z) this is very computationally expensive.

Solution: Instead of filtering spectra examine only the regions of interest and compute scores from these. This takes advantage of the fact that the spectra are sorted by their m/z.

This PR involves creating new functions and deprecating old ones however it should not affect the output.

Some of the functions augmented include:

  • integrateWindow + integrateWindows = Now support integration of an ion mobility enhanced window. Accomplishes this adding drift_start and drift_end arguments. This family of function all rely on a new function _integrateWindowHelper(). This function can now integrate across a vector of spectra. This means that the spectra do not need to be added up by the expensive _addUpSpectra() function.
  • add DIA Scoring functions can now handle ion mobility
  • computeIonMobilogram() (from IonMobilityScoring) = Computes ion mobilogram in more efficient manner than previous filterByDriftSpectrum
  • fetchSpectrumSwath now does not automatically call _addUpSpectra() (which can be quite computationally expensive with ion mobility as it calls filterByDrift())

Some Functions Removed include:

  • DIAHelpers::integrateDriftSpectrum = not needed because the base function now supports integrating spectra.

Tests:

  • integrateWindow tests were added.
  • OpenSwathTest 23_b was added to test the -add_up_spectra flag

Issues

  • With the new implementation this removes support for resampling spectra addition with ion mobility
    • Possibly deprecate resampling spectra?
  • Currently SONAR still uses the old implementation of adding up spectra

Checklist

  • Make sure that you are listed in the AUTHORS file
  • Add relevant changes and new features to the CHANGELOG file
  • I have commented my code, particularly in hard-to-understand areas
  • New and existing unit tests pass locally with my changes
  • Updated or added python bindings for changed or new classes (Tick if no updates were necessary.)

How can I get additional information on failed tests during CI

Click to expand If your PR is failing you can check out
  • The details of the action statuses at the end of the PR or the "Checks" tab.
  • http://cdash.openms.de/index.php?project=OpenMS and look for your PR. Use the "Show filters" capability on the top right to search for your PR number.
    If you click in the column that lists the failed tests you will get detailed error messages.

Advanced commands (admins / reviewer only)

Click to expand
  • /reformat (experimental) applies the clang-format style changes as additional commit. Note: your branch must have a different name (e.g., yourrepo:feature/XYZ) than the receiving branch (e.g., OpenMS:develop). Otherwise, reformat fails to push.
  • setting the label "NoJenkins" will skip tests for this PR on jenkins (saves resources e.g., on edits that do not affect tests)
  • commenting with rebuild jenkins will retrigger Jenkins-based CI builds

⚠️ Note: Once you opened a PR try to minimize the number of pushes to it as every push will trigger CI (automated builds and test) and is rather heavy on our infrastructure (e.g., if several pushes per day are performed).

jcharkow and others added 30 commits November 2, 2022 15:58
Try to forgo filtering by ion mobility by
just computing scores directly from the IM spectra.

At the same time try to avoid adding up spectra by computing scores
directly from a list of spectra

It is still quite buggy and does not produce expected results.
before would get some wonky values because integrate spectrum would not
stop at boundaries of the vector.
accidentally added tmp files, delete them here
Instead of filtering and adding the entire spectra just go about looking
for the regions of interest
added error checks to ensure that IM is present before scoring.
this includes filterByDrift(), _getAddedSpectra() and old
integrateWindow() functions
Rename function signatures to be addresses instead of objects
restructure fetchSpectrumSwath() tests for the new implementation
Add back methods that previously took away as they will be used for
SONAR. SONAR is likely quite inefficient since using the old methods.
(Have to resample/add the entire spectrum not just select regions)
in integrateSpectrum() if drift time is not present set IM to -1
Update DIAPreScoring tests to include ion mobility tests

other minor changes to test scripts
tests new integrate window functions with/without ion mobility as well
as special cases (e.g. empty spectra, empty arrays, no windows provided
etc.)
Swath map map correction test because the window boundaries were so
large in one example that the IM boundary would be a value less than 0. To account for this change implementation of
DIAHelpers::adjustExtractionWindow to set the value to 0 if lower boundary
is a value less than 0.
this test has been replaced with test 23_b which tests the same concept
of adding up spectra
Co-authored-by: Hannes Roest <hannesroest@gmx.ch>
This test fails in debug mode because it is caught by an
OPENMS_PRECONDITION in fetchSpectrumSwath()
test was failing in debug mode because it was being caught by the fact
that there is no IM informaiton in the MS1 spectra. To avoid this, turn
off MS1 mode. This invovles updating the test files.

NOTE: Unsure if this is the best way to deal with this exception.
jcharkow and others added 30 commits August 26, 2023 16:11
try to get windows build working
- test chromatogram was not sorted when should have been
- Change names of unused variables
allow for spectrum sort by mz using without copying
eliminate usage of new minSpanIfSingular() function for ppm
minimizes the amount of definitions of the same name
fix to documentation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants