chore: consolidate gptoss example + fixes#283
Conversation
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
There was a problem hiding this comment.
Code Review
This pull request focuses on extensive documentation updates, example consolidation, and internal API refactoring. Key changes include restructuring the GPT-OSS-120B benchmarking examples, updating CLI argument references in READMEs, and transitioning the CPU affinity configuration to a boolean flag. Additionally, the internal issue_query method was renamed to issue. Feedback is provided regarding the removal of conditional checks in the endpoint client configuration, which inadvertently prevents the programmatic override of adapters and accumulators.
There was a problem hiding this comment.
Pull request overview
This PR updates documentation and examples to align with recent endpoint-client refactors and consolidates the GPT-OSS-120B end-to-end example (configs + accuracy scripts) so it runs cleanly again.
Changes:
- Update docs/READMEs to reflect refactored endpoint client API (
issue/poll/drain, updated worker internals, CLI flag updates). - Consolidate GPT-OSS-120B example guidance (vLLM + SGLang) and remove the separate SGLang-only example README.
- Add standalone evaluation scripts (GPQA/AIME25/LiveCodeBench) and plumb
--force-regeneratethrough dataset generation in the example runner.
Reviewed changes
Copilot reviewed 13 out of 16 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
src/inference_endpoint/profiling/README.md |
Updates profiling callouts to match refactored client/worker method names. |
src/inference_endpoint/endpoint_client/config.py |
Adjusts default resolution so adapter/accumulator are always derived from api_type. |
src/inference_endpoint/endpoint_client/README.md |
Updates client API usage from issue_query to issue. |
src/inference_endpoint/dataset_manager/README.md |
Fixes import path for DatasetFormat. |
examples/README.md |
Updates GPT-OSS-120B example description and removes separate SGLang example entry. |
examples/07_GPT-OSS-120B_SGLang_Example/README.md |
Removes redundant SGLang-only README (content consolidated into 04 example). |
examples/04_GPTOSS120B_Example/run.py |
Adds force_regenerate passthrough for dataset generation. |
examples/04_GPTOSS120B_Example/gptoss_120b_example.yaml |
Updates default endpoint port to 30000. |
examples/04_GPTOSS120B_Example/eval_livecodebench.py |
Adds standalone LiveCodeBench re-scoring script from an existing report dir. |
examples/04_GPTOSS120B_Example/eval_gpqa.py |
Adds standalone GPQA re-scoring script from an existing report dir. |
examples/04_GPTOSS120B_Example/eval_aime.py |
Adds standalone AIME25 re-scoring script from an existing report dir. |
examples/04_GPTOSS120B_Example/Readme.md |
Consolidates end-to-end instructions for vLLM/SGLang + accuracy suite + troubleshooting. |
examples/02_ServerBenchmarking/README.md |
Updates CLI example flags to current --endpoints/--dataset/--model usage. |
docs/CLI_QUICK_REFERENCE.md |
Updates init command guidance to include concurrency and removes redundant line. |
docs/CLIENT_PERFORMANCE_TUNING.md |
Updates CPU affinity docs to match enable_cpu_affinity + --no-cpu-affinity. |
AGENTS.md |
Updates repository structure notes for core/ layout and record location. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Consolidates and repairs the gpt-oss-120b examples and related client configuration behavior so the selected endpoint_config.api_type reliably drives adapter/accumulator selection end-to-end after recent refactors.
Changes:
- Propagate
endpoint_config.api_typeintosettings.client.api_typeat config construction time, and makeHTTPClientConfig.with_updates()clear stale auto-resolved fields whenapi_typechanges. - Update benchmark execution/docs to rely on the propagated
api_type(no runtime patching inexecute.py) and refresh client/profiling documentation for renamed methods. - Consolidate GPT-OSS examples (remove the separate SGLang example README, enhance the unified example, add accuracy eval scripts, and update example docs/imports).
Reviewed changes
Copilot reviewed 16 out of 19 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/config/test_schema.py | Adds regression tests ensuring api_type propagation and adapter/accumulator resolution behave correctly across with_updates(). |
| src/inference_endpoint/profiling/README.md | Updates profiling guide to match renamed client/worker methods. |
| src/inference_endpoint/endpoint_client/config.py | Adds HTTPClientConfig.with_updates() logic to clear adapter/accumulator when api_type changes. |
| src/inference_endpoint/endpoint_client/README.md | Updates docs to reflect HTTPEndpointClient.issue() API naming. |
| src/inference_endpoint/dataset_manager/README.md | Fixes import guidance for DatasetFormat (not exported from package root). |
| src/inference_endpoint/config/schema.py | Adds validator to propagate endpoint_config.api_type into the internal HTTP client config at construction. |
| src/inference_endpoint/commands/benchmark/execute.py | Removes runtime api_type override, relying on schema propagation. |
| examples/README.md | Updates GPT-OSS example description and removes separate SGLang example entry. |
| examples/07_GPT-OSS-120B_SGLang_Example/README.md | Deletes now-redundant standalone SGLang example README. |
| examples/04_GPTOSS120B_Example/run.py | Adds force_regenerate passthrough to dataset generators. |
| examples/04_GPTOSS120B_Example/gptoss_120b_example.yaml | Updates endpoint default port in the example config. |
| examples/04_GPTOSS120B_Example/eval_livecodebench.py | Adds standalone scoring script for LiveCodeBench from a saved dataset/report. |
| examples/04_GPTOSS120B_Example/eval_gpqa.py | Adds standalone scoring script for GPQA from a saved dataset/report. |
| examples/04_GPTOSS120B_Example/eval_aime.py | Adds standalone scoring script for AIME25 from a saved dataset/report. |
| examples/04_GPTOSS120B_Example/Readme.md | Expands and consolidates end-to-end instructions for vLLM + SGLang and accuracy workflows. |
| examples/02_ServerBenchmarking/README.md | Updates CLI usage example to current long-form flags. |
| docs/CLI_QUICK_REFERENCE.md | Updates init command quick reference to include concurrency template option. |
| docs/CLIENT_PERFORMANCE_TUNING.md | Updates CPU-affinity documentation to match enable_cpu_affinity config/CLI. |
| AGENTS.md | Updates repo structure documentation (core/types, record location). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
f782cec to
5b19767
Compare
What does this PR do?
Consolidate the gpt example plus pipe clean to make sure it works end to end after recent refactors.
Type of change
Related issues
Testing
Checklist