Bug Fix - Fix NVBandwidth benchmark results parsing bug (#748) by NJX-njx · Pull Request #782 · microsoft/superbenchmark

NJX-njx · 2026-03-03T14:11:47Z

Summary

Fixes #748 - NVBandwidth benchmark results parsing bug causing invalid metric names like _sum_None\ and \device_to_device_latency_sm_sum_None.

Root Cause

When parsing SUM lines, \parse_status['test_name']\ or \parse_status['benchmark_type']\ could be empty/None (e.g., after waived tests or when header parsing fails), leading to invalid metric keys.

Changes

Add validation: only add SUM result when both \ est_name\ and \�enchmark_type\ are valid
Use test name from SUM line (\match.group(1)) as fallback when \parse_status\ is stale
Infer \�enchmark_type\ from test name when not set: 'latency' in name -> 'lat', else 'bw'

Testing

Logic verified against \ ests/data/nvbandwidth_results.log\ format
Unit tests require CUDA platform (CI will run on Linux)

…ency_sm_sum_None - Add validation before adding SUM results: only add when test_name and benchmark_type are valid - Use test name from SUM line as fallback when parse_status is stale (e.g., after waived tests) - Infer benchmark_type from test name when not set (latency tests -> lat, else bw) Fixes microsoft#748 Made-with: Cursor

Copilot

Pull request overview

Fixes NVBandwidth SUM-line parsing to prevent invalid metric keys (e.g., _sum_None) by making SUM parsing more resilient when parsing state is missing or stale.

Changes:

Use the SUM line’s captured test name as a fallback when parse_status['test_name'] is empty.
Infer benchmark_type from the test name when it wasn’t detected from the matrix header.
Guard SUM metric emission to only occur when both test_name and benchmark_type are valid.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-03T14:15:10Z

        if self.re_summary_pattern.match(line):
-            value = self.re_summary_pattern.match(line).group(2)
-            test_name = parse_status['test_name']
+            match = self.re_summary_pattern.match(line)
+            value = match.group(2)


re_summary_pattern.match(line) is evaluated twice (once in the if and again to assign match). Consider assigning once (e.g., match = ...; if match:) to avoid duplicated work and keep the control flow simpler.

@NJX-njx Could you please resolve the comments from Copilot? Thanks.

Copilot · 2026-03-03T14:15:10Z

+            # Use test_name from parse_status, fallback to group(1) from SUM line
+            test_name = parse_status['test_name'] or match.group(1).lower()
            benchmark_type = parse_status['benchmark_type']
-            parse_status['results'][f'{test_name}_sum_{benchmark_type}'] = float(value)
+            # Infer benchmark_type from test_name if not set (e.g., after waived tests)
+            if benchmark_type is None:
+                benchmark_type = 'lat' if 'latency' in test_name else 'bw'
+            # Only add result when we have valid metric name (avoid _sum_None or sum_None)
+            if test_name and benchmark_type:
+                parse_status['results'][f'{test_name}_sum_{benchmark_type}'] = float(value)


The new fallback/inference paths for SUM parsing (using match.group(1) when parse_status['test_name'] is empty, and inferring benchmark_type when it's None) aren’t covered by existing nvbandwidth tests (current tests only exercise the full header+matrix path). Adding a unit test with a minimal raw output containing a SUM ... line but no prior matrix header (and/or no preceding Running ...) would help prevent regressions of the _sum_None metric-name bug.

polarG · 2026-05-29T18:38:07Z

@@ -190,10 +190,17 @@ def _process_raw_line(self, line, parse_status):



re_summary_pattern.match(line) is still evaluated twice — once in this if, then again on the line below to assign match. Copilot raised this in #discussion_r2878517701 and @guoshzhao explicitly asked for it to be addressed in #discussion_r3070415674; please collapse the two calls:

match = self.re_summary_pattern.match(line) if match: value = match.group(2) ...

polarG · 2026-05-29T18:38:22Z

+            test_name = parse_status['test_name'] or match.group(1).lower()
            benchmark_type = parse_status['benchmark_type']
-            parse_status['results'][f'{test_name}_sum_{benchmark_type}'] = float(value)
+            # Infer benchmark_type from test_name if not set (e.g., after waived tests)


This fallback fixes the SUM symptom but masks the true root cause of device_to_device_latency_sm_sum_None from issue #748.

Upstream nvbandwidth prints the matrix header for this test as:

Device to Device Latency SM GPU(row) <-> GPU(column) (ns)

(see third_party/nvbandwidth/testcases_sm.cpp:141). That line does NOT start with memcpy or memory latency, so re_matrix_header_line at line 19 never fires for this test. As a consequence:

parse_status['benchmark_type'] stays None.

metrix_row / metrix_col are never extracted.

The matrix-row branch at line 167 is guarded by parse_status['benchmark_type'] and therefore silently skips all per-cell results for device_to_device_latency_sm.

With this PR, the SUM is rescued via the 'latency' in test_name heuristic, but the per-cell device_to_device_latency_sm_gpu0_gpu1_lat (etc.) results are still lost — users will only see the SUM, not the matrix.

A more complete fix is to broaden the header detector itself, e.g.:

re_matrix_header_line = re.compile(r'^(memcpy|memory latency|Device to Device Latency)', re.IGNORECASE)

The existing branch 'bw' if 'bandwidth' in line else 'lat' already classifies correctly for both new and old header lines, so the fallback + inference here can stay as a defense-in-depth net while the regex change recovers the lost matrix data.

polarG · 2026-05-29T18:38:34Z

-            parse_status['results'][f'{test_name}_sum_{benchmark_type}'] = float(value)
+            # Infer benchmark_type from test_name if not set (e.g., after waived tests)
+            if benchmark_type is None:
+                benchmark_type = 'lat' if 'latency' in test_name else 'bw'


When this guard drops a SUM line (either test_name is empty or benchmark_type is still falsy after inference), nothing is logged. That trades a malformed metric name for silent data loss, which makes the next regression (a new upstream output format change) hard to detect from a passing run.

Please emit a warning here so the drop is observable, e.g.:

if test_name and benchmark_type: parse_status['results'][f'{test_name}_sum_{benchmark_type}'] = float(value) else: logger.warning( f'nvbandwidth: skipping SUM line, incomplete parse state ' f'(test_name={test_name!r}, benchmark_type={benchmark_type!r}, line={line!r})' )

The warning also gives users a clear signal in the runner logs when a future nvbandwidth release adds a test whose header line doesn't match the existing detection patterns.

polarG · 2026-05-29T18:38:50Z

-            value = self.re_summary_pattern.match(line).group(2)
-            test_name = parse_status['test_name']
+            match = self.re_summary_pattern.match(line)
+            value = match.group(2)


None of the three new behaviors here (or match.group(1).lower() fallback, 'latency' in test_name inference, if test_name and benchmark_type guard) is exercised by the existing test suite — tests/data/nvbandwidth_results.log and test_nvbandwidth.py::test_nvbandwidth_result_parsing_real_output only cover the happy memcpy ... / memory latency ... header path. Without a regression test, a future refactor can silently undo the fix from issue #748.

Copilot already flagged this in #discussion_r2878517758; please add a unit test using a synthetic raw output that triggers each new path. Minimal example exercising the device_to_device_latency_sm regression:

def test_nvbandwidth_sum_fallback_for_unknown_header(self): benchmark_name = 'nvbandwidth' (cls, _) = BenchmarkRegistry._BenchmarkRegistry__select_benchmark(benchmark_name, Platform.CUDA) benchmark = cls(benchmark_name, parameters='') assert benchmark._preprocess() raw_output = ( 'Running device_to_device_latency_sm.\n' 'Device to Device Latency SM GPU(row) <-> GPU(column) (ns)\n' ' 0 1\n' ' 0 772.58 810.10\n' ' 1 810.10 772.58\n' '\n' 'SUM device_to_device_latency_sm 3165.36\n' ) assert benchmark._process_raw_result(0, raw_output) assert benchmark.result['device_to_device_latency_sm_sum_lat'][0] == 3165.36 # \u2191 with the suggested regex broadening (see [comment on L198](https://github.com/microsoft/superbenchmark/pull/782#discussion_r3326258884)), per-cell metrics should also be present.

A second case covering the test_name == '' and benchmark_type is None drop path (asserting the SUM is skipped and a warning is logged) would also be valuable, especially if the logger.warning suggestion on L200 is adopted.

polarG · 2026-06-01T01:51:57Z

Hi @NJX-njx, could address the current comments? We are planning to merge this PR. Thanks!

Copilot AI review requested due to automatic review settings March 3, 2026 14:11

NJX-njx requested a review from a team as a code owner March 3, 2026 14:11

Copilot started reviewing on behalf of NJX-njx March 3, 2026 14:12 View session

Copilot AI reviewed Mar 3, 2026

View reviewed changes

guoshzhao changed the title ~~Fix NVBandwidth benchmark results parsing bug (#748)~~ Bug Fix - Fix NVBandwidth benchmark results parsing bug (#748) Mar 11, 2026

guoshzhao requested review from guoshzhao and polarG March 11, 2026 16:16

guoshzhao self-assigned this Mar 11, 2026

guoshzhao added bug Something isn't working benchmarks SuperBench Benchmarks micro-benchmarks Micro Benchmark Test for SuperBench Benchmarks labels Mar 11, 2026

polarG reviewed May 29, 2026

View reviewed changes

Merge branch 'main' into fix/nvbandwidth-parsing-bug-748

b7fed65

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug Fix - Fix NVBandwidth benchmark results parsing bug (#748)#782

Bug Fix - Fix NVBandwidth benchmark results parsing bug (#748)#782
NJX-njx wants to merge 2 commits into
microsoft:mainfrom
NJX-njx:fix/nvbandwidth-parsing-bug-748

NJX-njx commented Mar 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

guoshzhao Apr 13, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

polarG May 29, 2026

Uh oh!

polarG May 29, 2026

Uh oh!

polarG May 29, 2026

Uh oh!

polarG May 29, 2026

Uh oh!

polarG commented Jun 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		@@ -190,10 +190,17 @@ def _process_raw_line(self, line, parse_status):

Conversation

NJX-njx commented Mar 3, 2026

Summary

Root Cause

Changes

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

guoshzhao Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

polarG May 29, 2026

Choose a reason for hiding this comment

Uh oh!

polarG May 29, 2026

Choose a reason for hiding this comment

Uh oh!

polarG May 29, 2026

Choose a reason for hiding this comment

Uh oh!

polarG May 29, 2026

Choose a reason for hiding this comment

Uh oh!

polarG commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

polarG commented Jun 1, 2026 •

edited

Loading