Fix dead condition, division-by-zero, and uninitialized members in LLM stats (#18819)#18819
Fix dead condition, division-by-zero, and uninitialized members in LLM stats (#18819)#18819kirklandsign wants to merge 1 commit intomainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18819
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 3 New Failures, 18 Cancelled Jobs, 5 Unrelated FailuresAs of commit 7a0d529 with merge base 5e8a0df ( NEW FAILURES - The following jobs have failed:
CANCELLED JOBS - The following jobs were cancelled. Please retry:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@kirklandsign has exported this pull request. If you are a Meta employee, you can view the originating Diff in D99708774. |
This PR needs a
|
There was a problem hiding this comment.
Pull request overview
Fixes correctness/robustness issues in the LLM runner stats/reporting path, improving runtime diagnostics and avoiding undefined behavior in edge cases (warmup/fast runs and uninitialized reads).
Changes:
- Fix off-by-one “max new tokens reached” condition in
TextLLMRunner::generate(). - Guard token-rate computations in
print_report()against division-by-zero. - Add default initializers to
Statstimestamp/counter members to avoid uninitialized reads.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| extension/llm/runner/text_llm_runner.cpp | Fixes a dead/incorrect condition by aligning the comparison with the max_new_tokens - 1 passed to the generator. |
| extension/llm/runner/stats.h | Initializes Stats members to safe defaults and prevents division-by-zero in printed token-rate metrics. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "\t\tGenerated %" PRIu64 | ||
| " tokens:\t%f (seconds)\t\t Rate: \t%f (tokens/second)", | ||
| stats.num_generated_tokens, |
There was a problem hiding this comment.
stats.num_generated_tokens is int64_t, but the format string uses PRIu64. This is a signed/unsigned format mismatch and can lead to undefined behavior or incorrect output on some platforms. Use the matching PRId64 (or cast the value to uint64_t if it’s guaranteed non-negative) to align the format specifier with the argument type.
…M stats (#18819) Summary: Three issues fixed: 1. text_llm_runner.cpp: The condition num_generated_tokens == max_new_tokens was always false because TextTokenGenerator::generate() receives max_new_tokens - 1. Fixed to compare against max_new_tokens - 1. 2. stats.h print_report(): Division by zero when inference/prefill/decode time is zero (e.g., during very fast warmup runs). Added guards matching the pattern already used in stats_to_json_string(). 3. stats.h Stats: Added default initializers (= 0) to all timestamp and counter members to prevent undefined behavior from uninitialized reads. Reviewed By: manuelcandales Differential Revision: D99708774
42b87fa to
220e1be
Compare
…M stats (#18819) Summary: Pull Request resolved: #18819 Three issues fixed: 1. text_llm_runner.cpp: The condition num_generated_tokens == max_new_tokens was always false because TextTokenGenerator::generate() receives max_new_tokens - 1. Fixed to compare against max_new_tokens - 1. 2. stats.h print_report(): Division by zero when inference/prefill/decode time is zero (e.g., during very fast warmup runs). Added guards matching the pattern already used in stats_to_json_string(). 3. stats.h Stats: Added default initializers (= 0) to all timestamp and counter members to prevent undefined behavior from uninitialized reads. Reviewed By: manuelcandales Differential Revision: D99708774
220e1be to
0fec9a7
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
extension/llm/runner/stats.h
Outdated
| stats.SCALING_FACTOR_UNITS_PER_SECOND); | ||
| inference_time_ms > 0? (stats.num_generated_tokens) / inference_time_ms * | ||
| stats.SCALING_FACTOR_UNITS_PER_SECOND | ||
| : 0.0); |
There was a problem hiding this comment.
Consider adding a small regression test that calls print_report() with Stats timestamps set such that inference_time_ms/prompt_eval_time/eval_time are 0 (or negative) to ensure the rate calculations don’t reintroduce division-by-zero/inf outputs. The LLM runner already has gtest coverage (e.g., extension/llm/runner/test/test_text_llm_runner.cpp), so this should be straightforward to exercise in CI.
…M stats (#18819) Summary: Three issues fixed: 1. text_llm_runner.cpp: The condition num_generated_tokens == max_new_tokens was always false because TextTokenGenerator::generate() receives max_new_tokens - 1. Fixed to compare against max_new_tokens - 1. 2. stats.h print_report(): Division by zero when inference/prefill/decode time is zero (e.g., during very fast warmup runs). Added guards matching the pattern already used in stats_to_json_string(). 3. stats.h Stats: Added default initializers (= 0) to all timestamp and counter members to prevent undefined behavior from uninitialized reads. Reviewed By: manuelcandales Differential Revision: D99708774
0fec9a7 to
993b988
Compare
…M stats (#18819) Summary: Three issues fixed: 1. text_llm_runner.cpp: The condition num_generated_tokens == max_new_tokens was always false because TextTokenGenerator::generate() receives max_new_tokens - 1. Fixed to compare against max_new_tokens - 1. 2. stats.h print_report(): Division by zero when inference/prefill/decode time is zero (e.g., during very fast warmup runs). Added guards matching the pattern already used in stats_to_json_string(). 3. stats.h Stats: Added default initializers (= 0) to all timestamp and counter members to prevent undefined behavior from uninitialized reads. Reviewed By: manuelcandales Differential Revision: D99708774
993b988 to
070a377
Compare
…M stats (#18819) Summary: Three issues fixed: 1. text_llm_runner.cpp: The condition num_generated_tokens == max_new_tokens was always false because TextTokenGenerator::generate() receives max_new_tokens - 1. Fixed to compare against max_new_tokens - 1. 2. stats.h print_report(): Division by zero when inference/prefill/decode time is zero (e.g., during very fast warmup runs). Added guards matching the pattern already used in stats_to_json_string(). 3. stats.h Stats: Added default initializers (= 0) to all timestamp and counter members to prevent undefined behavior from uninitialized reads. Reviewed By: manuelcandales Differential Revision: D99708774
070a377 to
088daf8
Compare
…M stats (#18819) Summary: Three issues fixed: 1. text_llm_runner.cpp: The condition num_generated_tokens == max_new_tokens was always false because TextTokenGenerator::generate() receives max_new_tokens - 1. Fixed to compare against max_new_tokens - 1. 2. stats.h print_report(): Division by zero when inference/prefill/decode time is zero (e.g., during very fast warmup runs). Added guards matching the pattern already used in stats_to_json_string(). 3. stats.h Stats: Added default initializers (= 0) to all timestamp and counter members to prevent undefined behavior from uninitialized reads. Reviewed By: manuelcandales Differential Revision: D99708774
088daf8 to
376d9ca
Compare
…M stats (#18819) Summary: Three issues fixed: 1. text_llm_runner.cpp: The condition num_generated_tokens == max_new_tokens was always false because TextTokenGenerator::generate() receives max_new_tokens - 1. Fixed to compare against max_new_tokens - 1. 2. stats.h print_report(): Division by zero when inference/prefill/decode time is zero (e.g., during very fast warmup runs). Added guards matching the pattern already used in stats_to_json_string(). 3. stats.h Stats: Added default initializers (= 0) to all timestamp and counter members to prevent undefined behavior from uninitialized reads. Reviewed By: manuelcandales Differential Revision: D99708774
376d9ca to
1bc1148
Compare
…M stats (#18819) Summary: Three issues fixed: 1. text_llm_runner.cpp: The condition num_generated_tokens == max_new_tokens was always false because TextTokenGenerator::generate() receives max_new_tokens - 1. Fixed to compare against max_new_tokens - 1. 2. stats.h print_report(): Division by zero when inference/prefill/decode time is zero (e.g., during very fast warmup runs). Added guards matching the pattern already used in stats_to_json_string(). 3. stats.h Stats: Added default initializers (= 0) to all timestamp and counter members to prevent undefined behavior from uninitialized reads. Reviewed By: manuelcandales Differential Revision: D99708774
1bc1148 to
bf2b8b0
Compare
bf2b8b0 to
f42844b
Compare
…M stats (#18819) Summary: Pull Request resolved: #18819 Three issues fixed: 1. text_llm_runner.cpp: The condition num_generated_tokens == max_new_tokens was always false because TextTokenGenerator::generate() receives max_new_tokens - 1. Fixed to compare against max_new_tokens - 1. 2. stats.h print_report(): Division by zero when inference/prefill/decode time is zero (e.g., during very fast warmup runs). Added guards matching the pattern already used in stats_to_json_string(). 3. stats.h Stats: Added default initializers (= 0) to all timestamp and counter members to prevent undefined behavior from uninitialized reads. Reviewed By: manuelcandales Differential Revision: D99708774
…M stats (#18819) Summary: Three issues fixed: 1. text_llm_runner.cpp: The condition num_generated_tokens == max_new_tokens was always false because TextTokenGenerator::generate() receives max_new_tokens - 1. Fixed to compare against max_new_tokens - 1. 2. stats.h print_report(): Division by zero when inference/prefill/decode time is zero (e.g., during very fast warmup runs). Added guards matching the pattern already used in stats_to_json_string(). 3. stats.h Stats: Added default initializers (= 0) to all timestamp and counter members to prevent undefined behavior from uninitialized reads. Reviewed By: manuelcandales Differential Revision: D99708774
f42844b to
0f96b60
Compare
…M stats (#18819) Summary: Three issues fixed: 1. text_llm_runner.cpp: The condition num_generated_tokens == max_new_tokens was always false because TextTokenGenerator::generate() receives max_new_tokens - 1. Fixed to compare against max_new_tokens - 1. 2. stats.h print_report(): Division by zero when inference/prefill/decode time is zero (e.g., during very fast warmup runs). Added guards matching the pattern already used in stats_to_json_string(). 3. stats.h Stats: Added default initializers (= 0) to all timestamp and counter members to prevent undefined behavior from uninitialized reads. Reviewed By: manuelcandales Differential Revision: D99708774
0f96b60 to
4779b56
Compare
…M stats (#18819) Summary: Three issues fixed: 1. text_llm_runner.cpp: The condition num_generated_tokens == max_new_tokens was always false because TextTokenGenerator::generate() receives max_new_tokens - 1. Fixed to compare against max_new_tokens - 1. 2. stats.h print_report(): Division by zero when inference/prefill/decode time is zero (e.g., during very fast warmup runs). Added guards matching the pattern already used in stats_to_json_string(). 3. stats.h Stats: Added default initializers (= 0) to all timestamp and counter members to prevent undefined behavior from uninitialized reads. Reviewed By: manuelcandales Differential Revision: D99708774
4779b56 to
7a0d529
Compare
Summary:
Three issues fixed:
text_llm_runner.cpp: The condition num_generated_tokens == max_new_tokens
was always false because TextTokenGenerator::generate() receives
max_new_tokens - 1. Fixed to compare against max_new_tokens - 1.
stats.h print_report(): Division by zero when inference/prefill/decode
time is zero (e.g., during very fast warmup runs). Added guards matching
the pattern already used in stats_to_json_string().
stats.h Stats: Added default initializers (= 0) to all timestamp and
counter members to prevent undefined behavior from uninitialized reads.
Reviewed By: manuelcandales
Differential Revision: D99708774