fix(config): scope worker-memory budget to per-node#24
Merged
Conversation
The memory-budget guard in ConfigArguments.validate compared a global worker count (read_threads × comm_size) against one node's RAM (psutil.virtual_memory().total). Operands in incompatible units, so valid multi-node configurations were rejected with a misleading "reduce read_threads to N" suggestion where N collapsed to 0 at ~100 nodes — leaving the user with no valid setting. Changes: - Worker count is now per-node: read_threads × DLIOMPI.ranks_per_node(). ranks_per_node() already exists (used by the auto-sizing path, config.py:774) and falls back to comm_size in CHILD_INITIALIZED state, matching the conservative behavior elsewhere in this file. - Budget basis is now psutil.virtual_memory().available with a 90% safety margin (so already-used RAM is respected). The previous .total basis blocked large machines for fresh RAM that was never going to be there. - Error and warning messages now name the host, report local_ranks instead of comm_size, and the max_threads suggestion is derived from per-node arithmetic. - The 50% available-RAM warning gets the same per-node treatment. Worked example from the issue: 100 nodes × 12 ranks × read_threads=16, 256 GiB/node — per-node demand is 96 GB (16 × 12 × 0.5), well under budget. Old code computed 9.6 TB against 256 GB and rejected. New code computes 96 GB against 256 GB and passes. Refs mlcommons/storage#448
Author
|
@russfellows @idevasena |
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Addresses mlcommons/storage#448.
Problem
The memory-budget guard in
ConfigArguments.validate(dlio_benchmark/utils/config.py:391-418) compares a global worker count against one node's RAM:Operands are in incompatible units, so valid multi-node configurations are rejected. The suggested
max_threadsthen collapses to 0 at ~100 nodes — the user is left with no valid setting.Reporter's table:
The 50%-of-available warning further down has the same bug.
Fix
read_threads × DLIOMPI.ranks_per_node().ranks_per_node()already exists inutils/utility.py:309and is already used by the auto-sizing path atconfig.py:774— this PR aligns the memory check with that established pattern. InCHILD_INITIALIZEDstate it conservatively falls back tocomm_size.psutil.virtual_memory().availablewith a 90% safety margin so already-used RAM is respected. The previous.totalbasis blocked large machines for fresh RAM that was never going to be there.local_ranksinstead ofcomm_size, and themax_threadssuggestion is derived from per-node arithmetic.Worked example from the issue: 100 nodes × 12 ranks ×
read_threads=16, 256 GiB/node — per-node demand is 96 GB (16 × 12 × 0.5), well under budget. Old code computes 9.6 TB and rejects. New code computes 96 GB and passes.Test plan
read_threads=16on 256 GiB nodes runs end-to-end with no false rejection.