Skip to content

Remove $EGGNOG_DBMEM from eggnog env vars#104

Open
cat-bro wants to merge 1 commit into
mainfrom
cat-bro-patch-5
Open

Remove $EGGNOG_DBMEM from eggnog env vars#104
cat-bro wants to merge 1 commit into
mainfrom
cat-bro-patch-5

Conversation

@cat-bro
Copy link
Copy Markdown
Collaborator

@cat-bro cat-bro commented Jan 30, 2026

Galaxy Australia gives each eggnog job 32GB of RAM. The presence of —dbmem (introduced recently) in the command line causes the db to be loaded to memory. GA has strict OOM killing and all eggnog jobs since this change have been OOM killed. I’m about to try overriding this in our local config.

Galaxy Australia gives each eggnog job 32GB of RAM. The presence of `—dbmem` (introduced recently) in the command line causes the db to be loaded to memory. GA has strict OOM killing and all eggnog jobs since this change have been OOM killed. I’m about to try overriding this in our local config.
@cat-bro
Copy link
Copy Markdown
Collaborator Author

cat-bro commented Jan 30, 2026

I think this is the right move. After removing the env var on Galaxy Aus the jobs are OK. The db file in CMVFS is 40GB+ but even if we allocated enough memory, it takes a long time to load into memory and would add runtime for a lot of jobs.

@cat-bro cat-bro requested a review from bgruening January 30, 2026 13:21
@bgruening
Copy link
Copy Markdown
Member

Should we maybe include a rule and in dependency on the number of sequences we use this option. We are using this option because most of our inputs that we see have many hundreds or thousands of sequences. And our understanding is that in those cases this dramatically reduces disc IO and runtime.

@bgruening
Copy link
Copy Markdown
Member

Double-checking are you also using version 5.0.2 of the database?

@cat-bro
Copy link
Copy Markdown
Collaborator Author

cat-bro commented Jan 30, 2026

Double-checking are you also using version 5.0.2 of the database?

Yes

Should we maybe include a rule and in dependency on the number of sequences we use this option. We are using this option because most of our inputs that we see have many hundreds or thousands of sequences. And our understanding is that in those cases this dramatically reduces disc IO and runtime.

Sounds good. We’ll need to give these jobs enough mem for the —dbmem under this rule.

@cat-bro
Copy link
Copy Markdown
Collaborator Author

cat-bro commented Jan 30, 2026

Ages ago we were contemplating a sequence check for blast and never got around to using it. This worked, though I’m not sure if pop() is the best operation.

toolshed.g2.bx.psu.edu/repos/devteam/ncbi_blast_plus/ncbi_blastn_wrapper/.*:
    context:
      fasta_sequence_limit: 10000
    rules:
    - if: |
        input_fasta_dataset = job.input_datasets.pop()
        num_sequences = input_fasta_dataset.dataset.metadata.sequences
        num_sequences > fasta_sequence_limit
      fail: "Fasta input exceeds limit of {fasta_sequence_limit}. Email help@genome.edu.au if you think this is in error, or for some advice"

@bgruening
Copy link
Copy Markdown
Member

bgruening commented Jan 30, 2026

Oh interesting, didn't know there is this use of the context. Yes, lets use that - this is cool!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants