Skip to content

Fix dead condition, division-by-zero, and uninitialized members in LLM stats (#18819)#18819

Open
kirklandsign wants to merge 1 commit intomainfrom
export-D99708774
Open

Fix dead condition, division-by-zero, and uninitialized members in LLM stats (#18819)#18819
kirklandsign wants to merge 1 commit intomainfrom
export-D99708774

Conversation

@kirklandsign
Copy link
Copy Markdown
Contributor

@kirklandsign kirklandsign commented Apr 10, 2026

Summary:

Three issues fixed:

  1. text_llm_runner.cpp: The condition num_generated_tokens == max_new_tokens
    was always false because TextTokenGenerator::generate() receives
    max_new_tokens - 1. Fixed to compare against max_new_tokens - 1.

  2. stats.h print_report(): Division by zero when inference/prefill/decode
    time is zero (e.g., during very fast warmup runs). Added guards matching
    the pattern already used in stats_to_json_string().

  3. stats.h Stats: Added default initializers (= 0) to all timestamp and
    counter members to prevent undefined behavior from uninitialized reads.

Reviewed By: manuelcandales

Differential Revision: D99708774

Copilot AI review requested due to automatic review settings April 10, 2026 18:00
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Apr 10, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18819

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 3 New Failures, 18 Cancelled Jobs, 5 Unrelated Failures

As of commit 7a0d529 with merge base 5e8a0df (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOBS - The following jobs were cancelled. Please retry:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 10, 2026
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync bot commented Apr 10, 2026

@kirklandsign has exported this pull request. If you are a Meta employee, you can view the originating Diff in D99708774.

@github-actions
Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes correctness/robustness issues in the LLM runner stats/reporting path, improving runtime diagnostics and avoiding undefined behavior in edge cases (warmup/fast runs and uninitialized reads).

Changes:

  • Fix off-by-one “max new tokens reached” condition in TextLLMRunner::generate().
  • Guard token-rate computations in print_report() against division-by-zero.
  • Add default initializers to Stats timestamp/counter members to avoid uninitialized reads.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
extension/llm/runner/text_llm_runner.cpp Fixes a dead/incorrect condition by aligning the comparison with the max_new_tokens - 1 passed to the generator.
extension/llm/runner/stats.h Initializes Stats members to safe defaults and prevents division-by-zero in printed token-rate metrics.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 192 to 194
"\t\tGenerated %" PRIu64
" tokens:\t%f (seconds)\t\t Rate: \t%f (tokens/second)",
stats.num_generated_tokens,
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stats.num_generated_tokens is int64_t, but the format string uses PRIu64. This is a signed/unsigned format mismatch and can lead to undefined behavior or incorrect output on some platforms. Use the matching PRId64 (or cast the value to uint64_t if it’s guaranteed non-negative) to align the format specifier with the argument type.

Copilot uses AI. Check for mistakes.
@meta-codesync meta-codesync bot changed the title Fix dead condition, division-by-zero, and uninitialized members in LLM stats Fix dead condition, division-by-zero, and uninitialized members in LLM stats (#18819) Apr 10, 2026
meta-codesync bot pushed a commit that referenced this pull request Apr 10, 2026
…M stats (#18819)

Summary:

Three issues fixed:

1. text_llm_runner.cpp: The condition num_generated_tokens == max_new_tokens
   was always false because TextTokenGenerator::generate() receives
   max_new_tokens - 1. Fixed to compare against max_new_tokens - 1.

2. stats.h print_report(): Division by zero when inference/prefill/decode
   time is zero (e.g., during very fast warmup runs). Added guards matching
   the pattern already used in stats_to_json_string().

3. stats.h Stats: Added default initializers (= 0) to all timestamp and
   counter members to prevent undefined behavior from uninitialized reads.

Reviewed By: manuelcandales

Differential Revision: D99708774
kirklandsign added a commit that referenced this pull request Apr 10, 2026
…M stats (#18819)

Summary:
Pull Request resolved: #18819

Three issues fixed:

1. text_llm_runner.cpp: The condition num_generated_tokens == max_new_tokens
   was always false because TextTokenGenerator::generate() receives
   max_new_tokens - 1. Fixed to compare against max_new_tokens - 1.

2. stats.h print_report(): Division by zero when inference/prefill/decode
   time is zero (e.g., during very fast warmup runs). Added guards matching
   the pattern already used in stats_to_json_string().

3. stats.h Stats: Added default initializers (= 0) to all timestamp and
   counter members to prevent undefined behavior from uninitialized reads.

Reviewed By: manuelcandales

Differential Revision: D99708774
Copilot AI review requested due to automatic review settings April 10, 2026 18:15
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +168 to +176
stats.SCALING_FACTOR_UNITS_PER_SECOND);
inference_time_ms > 0? (stats.num_generated_tokens) / inference_time_ms *
stats.SCALING_FACTOR_UNITS_PER_SECOND
: 0.0);
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a small regression test that calls print_report() with Stats timestamps set such that inference_time_ms/prompt_eval_time/eval_time are 0 (or negative) to ensure the rate calculations don’t reintroduce division-by-zero/inf outputs. The LLM runner already has gtest coverage (e.g., extension/llm/runner/test/test_text_llm_runner.cpp), so this should be straightforward to exercise in CI.

Copilot uses AI. Check for mistakes.
meta-codesync bot pushed a commit that referenced this pull request Apr 10, 2026
…M stats (#18819)

Summary:

Three issues fixed:

1. text_llm_runner.cpp: The condition num_generated_tokens == max_new_tokens
   was always false because TextTokenGenerator::generate() receives
   max_new_tokens - 1. Fixed to compare against max_new_tokens - 1.

2. stats.h print_report(): Division by zero when inference/prefill/decode
   time is zero (e.g., during very fast warmup runs). Added guards matching
   the pattern already used in stats_to_json_string().

3. stats.h Stats: Added default initializers (= 0) to all timestamp and
   counter members to prevent undefined behavior from uninitialized reads.

Reviewed By: manuelcandales

Differential Revision: D99708774
meta-codesync bot pushed a commit that referenced this pull request Apr 10, 2026
…M stats (#18819)

Summary:

Three issues fixed:

1. text_llm_runner.cpp: The condition num_generated_tokens == max_new_tokens
   was always false because TextTokenGenerator::generate() receives
   max_new_tokens - 1. Fixed to compare against max_new_tokens - 1.

2. stats.h print_report(): Division by zero when inference/prefill/decode
   time is zero (e.g., during very fast warmup runs). Added guards matching
   the pattern already used in stats_to_json_string().

3. stats.h Stats: Added default initializers (= 0) to all timestamp and
   counter members to prevent undefined behavior from uninitialized reads.

Reviewed By: manuelcandales

Differential Revision: D99708774
meta-codesync bot pushed a commit that referenced this pull request Apr 10, 2026
…M stats (#18819)

Summary:

Three issues fixed:

1. text_llm_runner.cpp: The condition num_generated_tokens == max_new_tokens
   was always false because TextTokenGenerator::generate() receives
   max_new_tokens - 1. Fixed to compare against max_new_tokens - 1.

2. stats.h print_report(): Division by zero when inference/prefill/decode
   time is zero (e.g., during very fast warmup runs). Added guards matching
   the pattern already used in stats_to_json_string().

3. stats.h Stats: Added default initializers (= 0) to all timestamp and
   counter members to prevent undefined behavior from uninitialized reads.

Reviewed By: manuelcandales

Differential Revision: D99708774
meta-codesync bot pushed a commit that referenced this pull request Apr 10, 2026
…M stats (#18819)

Summary:

Three issues fixed:

1. text_llm_runner.cpp: The condition num_generated_tokens == max_new_tokens
   was always false because TextTokenGenerator::generate() receives
   max_new_tokens - 1. Fixed to compare against max_new_tokens - 1.

2. stats.h print_report(): Division by zero when inference/prefill/decode
   time is zero (e.g., during very fast warmup runs). Added guards matching
   the pattern already used in stats_to_json_string().

3. stats.h Stats: Added default initializers (= 0) to all timestamp and
   counter members to prevent undefined behavior from uninitialized reads.

Reviewed By: manuelcandales

Differential Revision: D99708774
Copilot AI review requested due to automatic review settings April 10, 2026 18:48
@kirklandsign kirklandsign review requested due to automatic review settings April 10, 2026 18:48
meta-codesync bot pushed a commit that referenced this pull request Apr 10, 2026
…M stats (#18819)

Summary:

Three issues fixed:

1. text_llm_runner.cpp: The condition num_generated_tokens == max_new_tokens
   was always false because TextTokenGenerator::generate() receives
   max_new_tokens - 1. Fixed to compare against max_new_tokens - 1.

2. stats.h print_report(): Division by zero when inference/prefill/decode
   time is zero (e.g., during very fast warmup runs). Added guards matching
   the pattern already used in stats_to_json_string().

3. stats.h Stats: Added default initializers (= 0) to all timestamp and
   counter members to prevent undefined behavior from uninitialized reads.

Reviewed By: manuelcandales

Differential Revision: D99708774
meta-codesync bot pushed a commit that referenced this pull request Apr 10, 2026
…M stats (#18819)

Summary:

Three issues fixed:

1. text_llm_runner.cpp: The condition num_generated_tokens == max_new_tokens
   was always false because TextTokenGenerator::generate() receives
   max_new_tokens - 1. Fixed to compare against max_new_tokens - 1.

2. stats.h print_report(): Division by zero when inference/prefill/decode
   time is zero (e.g., during very fast warmup runs). Added guards matching
   the pattern already used in stats_to_json_string().

3. stats.h Stats: Added default initializers (= 0) to all timestamp and
   counter members to prevent undefined behavior from uninitialized reads.

Reviewed By: manuelcandales

Differential Revision: D99708774
kirklandsign added a commit that referenced this pull request Apr 10, 2026
…M stats (#18819)

Summary:
Pull Request resolved: #18819

Three issues fixed:

1. text_llm_runner.cpp: The condition num_generated_tokens == max_new_tokens
   was always false because TextTokenGenerator::generate() receives
   max_new_tokens - 1. Fixed to compare against max_new_tokens - 1.

2. stats.h print_report(): Division by zero when inference/prefill/decode
   time is zero (e.g., during very fast warmup runs). Added guards matching
   the pattern already used in stats_to_json_string().

3. stats.h Stats: Added default initializers (= 0) to all timestamp and
   counter members to prevent undefined behavior from uninitialized reads.

Reviewed By: manuelcandales

Differential Revision: D99708774
meta-codesync bot pushed a commit that referenced this pull request Apr 10, 2026
…M stats (#18819)

Summary:

Three issues fixed:

1. text_llm_runner.cpp: The condition num_generated_tokens == max_new_tokens
   was always false because TextTokenGenerator::generate() receives
   max_new_tokens - 1. Fixed to compare against max_new_tokens - 1.

2. stats.h print_report(): Division by zero when inference/prefill/decode
   time is zero (e.g., during very fast warmup runs). Added guards matching
   the pattern already used in stats_to_json_string().

3. stats.h Stats: Added default initializers (= 0) to all timestamp and
   counter members to prevent undefined behavior from uninitialized reads.

Reviewed By: manuelcandales

Differential Revision: D99708774
meta-codesync bot pushed a commit that referenced this pull request Apr 10, 2026
…M stats (#18819)

Summary:

Three issues fixed:

1. text_llm_runner.cpp: The condition num_generated_tokens == max_new_tokens
   was always false because TextTokenGenerator::generate() receives
   max_new_tokens - 1. Fixed to compare against max_new_tokens - 1.

2. stats.h print_report(): Division by zero when inference/prefill/decode
   time is zero (e.g., during very fast warmup runs). Added guards matching
   the pattern already used in stats_to_json_string().

3. stats.h Stats: Added default initializers (= 0) to all timestamp and
   counter members to prevent undefined behavior from uninitialized reads.

Reviewed By: manuelcandales

Differential Revision: D99708774
Copilot AI review requested due to automatic review settings April 10, 2026 18:59
@kirklandsign kirklandsign review requested due to automatic review settings April 10, 2026 18:59
…M stats (#18819)

Summary:

Three issues fixed:

1. text_llm_runner.cpp: The condition num_generated_tokens == max_new_tokens
   was always false because TextTokenGenerator::generate() receives
   max_new_tokens - 1. Fixed to compare against max_new_tokens - 1.

2. stats.h print_report(): Division by zero when inference/prefill/decode
   time is zero (e.g., during very fast warmup runs). Added guards matching
   the pattern already used in stats_to_json_string().

3. stats.h Stats: Added default initializers (= 0) to all timestamp and
   counter members to prevent undefined behavior from uninitialized reads.

Reviewed By: manuelcandales

Differential Revision: D99708774
Copilot AI review requested due to automatic review settings April 10, 2026 19:13
@kirklandsign kirklandsign review requested due to automatic review settings April 10, 2026 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants