fix(config): scope worker-memory budget to per-node by FileSystemGuy · Pull Request #24 · mlcommons/DLIO_local_changes

FileSystemGuy · 2026-06-16T21:52:17Z

Problem

The memory-budget guard in ConfigArguments.validate (dlio_benchmark/utils/config.py:391-418) compares a global worker count against one node's RAM:

total_workers = self.read_threads * self.comm_size                # global
BUDGET_MB = psutil.virtual_memory().total // (1024 * 1024)        # local
estimated_mb = per_worker_mb * total_workers
if estimated_mb > BUDGET_MB:
    max_threads = BUDGET_MB // per_worker_mb // max(1, self.comm_size)

Operands are in incompatible units, so valid multi-node configurations are rejected. The suggested max_threads then collapses to 0 at ~100 nodes — the user is left with no valid setting.

Reporter's table:

Nodes	comm_size	"estimated"	Actual per-node need	Old verdict	Reality
1	12	96 GB	96 GB	passes	fits
4	48	384 GB	96 GB	rejects ("max_threads=10")	fits
100	1,200	9.6 TB	96 GB	rejects ("max_threads=0")	fits

The 50%-of-available warning further down has the same bug.

Fix

Worker count is per-node: read_threads × DLIOMPI.ranks_per_node(). ranks_per_node() already exists in utils/utility.py:309 and is already used by the auto-sizing path at config.py:774 — this PR aligns the memory check with that established pattern. In CHILD_INITIALIZED state it conservatively falls back to comm_size.
Budget basis is psutil.virtual_memory().available with a 90% safety margin so already-used RAM is respected. The previous .total basis blocked large machines for fresh RAM that was never going to be there.
Error and warning text now name the host, report local_ranks instead of comm_size, and the max_threads suggestion is derived from per-node arithmetic.

Worked example from the issue: 100 nodes × 12 ranks × read_threads=16, 256 GiB/node — per-node demand is 96 GB (16 × 12 × 0.5), well under budget. Old code computes 9.6 TB and rejects. New code computes 96 GB and passes.

Test plan

Manual: 1-node config that exceeds local RAM still raises with the new message.
Manual: 4-node × 12-ranks × read_threads=16 on 256 GiB nodes runs end-to-end with no false rejection.
Manual: confirm warning text reads "On host X: ... local_ranks=N = M workers ... on this node".

The memory-budget guard in ConfigArguments.validate compared a global worker count (read_threads × comm_size) against one node's RAM (psutil.virtual_memory().total). Operands in incompatible units, so valid multi-node configurations were rejected with a misleading "reduce read_threads to N" suggestion where N collapsed to 0 at ~100 nodes — leaving the user with no valid setting. Changes: - Worker count is now per-node: read_threads × DLIOMPI.ranks_per_node(). ranks_per_node() already exists (used by the auto-sizing path, config.py:774) and falls back to comm_size in CHILD_INITIALIZED state, matching the conservative behavior elsewhere in this file. - Budget basis is now psutil.virtual_memory().available with a 90% safety margin (so already-used RAM is respected). The previous .total basis blocked large machines for fresh RAM that was never going to be there. - Error and warning messages now name the host, report local_ranks instead of comm_size, and the max_threads suggestion is derived from per-node arithmetic. - The 50% available-RAM warning gets the same per-node treatment. Worked example from the issue: 100 nodes × 12 ranks × read_threads=16, 256 GiB/node — per-node demand is 96 GB (16 × 12 × 0.5), well under budget. Old code computed 9.6 TB against 256 GB and rejected. New code computes 96 GB against 256 GB and passes. Refs mlcommons/storage#448

FileSystemGuy · 2026-06-16T21:57:07Z

@russfellows @idevasena
Don't think too deeply about this, unless you want to, just approve it. :-)

russfellows

Curtis OK'd.

FileSystemGuy requested a review from a team June 16, 2026 21:52

FileSystemGuy mentioned this pull request Jun 16, 2026

fix: bump DLIO pin for storage #391 (sudo prompt) + #448 (per-node memory budget) mlcommons/storage#462

Merged

5 tasks

russfellows approved these changes Jun 17, 2026

View reviewed changes

russfellows merged commit 1d11f98 into main Jun 17, 2026
7 checks passed

russfellows deleted the FileSystemGuy-issue448-per-node-memory-budget branch June 17, 2026 15:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(config): scope worker-memory budget to per-node#24

fix(config): scope worker-memory budget to per-node#24
russfellows merged 1 commit into
mainfrom
FileSystemGuy-issue448-per-node-memory-budget

FileSystemGuy commented Jun 16, 2026

Uh oh!

FileSystemGuy commented Jun 16, 2026

Uh oh!

russfellows left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

FileSystemGuy commented Jun 16, 2026

Problem

Fix

Test plan

Uh oh!

FileSystemGuy commented Jun 16, 2026

Uh oh!

russfellows left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants