[None][chore] Remove legacy TensorRT examples, docs and tests#15760
Closed
Wanli-Jiang wants to merge 3 commits into
Closed
[None][chore] Remove legacy TensorRT examples, docs and tests#15760Wanli-Jiang wants to merge 3 commits into
Wanli-Jiang wants to merge 3 commits into
Conversation
Remove legacy TensorRT-backend example directories and documentation pages, add docs/source/legacy/tensorrt-backend-removal.md migration guide, and drop TensorRT references from the surviving docs (index, release notes, torch arch/kv-cache overviews, and two blogs). Pure docs/examples change: no Python or C++ source is touched, so the package build and import are unaffected. First of the incremental PRs that split Freddy's TensorRT-removal draft (QiJune#20). Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Delete the TensorRT-only test suites (tests/unittest/trt/**, the TRT builder/ runtime/plugin/quantization/medusa/lora-vs-trt tests under others/, llmapi, and tools/plugin_gen) and the TensorRT-engine Triton backend paths under triton_backend/, plus the deleted triton_server integration tests. Re-derive the test-list cleanup against current main (Freddy's draft predates main by ~165 commits, so his hunks no longer apply): drop the now-dangling entries from tests/integration/test_lists/test-db/*.yml and the stale TensorRT waives, and replace qa/llm_triton_integration.txt with the llmapi-backend triton tests. Entries were removed only where the referenced test path no longer exists on disk, so no surviving PyTorch test is affected. No Python or C++ source is touched. Second of the incremental PRs splitting Freddy's TensorRT-removal draft (QiJune#20). Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
8ee7701 to
0d4c986
Compare
Collaborator
Author
|
/bot run --disable-fail-fast |
Collaborator
|
PR_Github #56540 [ run ] triggered by Bot. Commit: |
nv-guomingz
reviewed
Jun 30, 2026
|
|
||
| ### API Changes | ||
|
|
||
| - **[BREAKING CHANGE] TensorRT backend removed.** PyTorch is now the sole execution backend. `LLM(backend="tensorrt")` now raises a `ValueError`; `TrtLlmArgs`, `tensorrt_llm._tensorrt_engine.LLM`, the `trtllm-build` / `trtllm-refit` / `trtllm-prune` CLIs, the `--backend tensorrt` CLI choice, and the per-model `convert_checkpoint.py` scripts have all been removed. The `tensorrt` pip dependency is no longer installed. See the [TensorRT Backend Removed migration guide](legacy/tensorrt-backend-removal.md) for details. |
Collaborator
There was a problem hiding this comment.
We need to update llm_args.py too.
Collaborator
Author
|
will split to smaller PRs. |
Collaborator
Author
|
step 1.1 PR: #15763 |
Collaborator
|
PR_Github #56540 [ run ] completed with state
|
1 task
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TensorRT-Backend Removal — Step 1 Report (Examples / Docs / Tests)
Summary
This is step 1 of removing the legacy TensorRT backend from TensorRT-LLM,
carved out of Freddy's draft PR (QiJune/TensorRT-LLM#20)
as two small, independently-reviewable, always-buildable PRs:
[None][chore] Remove TensorRT examples and docs, add migration guideexamples/,docs/[None][test] Remove TensorRT-backend tests and Triton TRT-engine pathstests/,triton_backend/Key guarantee: both commits touch ZERO Python/C++ source (
tensorrt_llm/**,cpp/**). They only remove/trim examples, docs, tests, and the C++ Triton-enginebackend. So neither PR can change runtime behavior or break the package build /
import tensorrt_llm. The actual source removal (Python modules, C++ nvinfer1decoupling, CLI/
backend="tensorrt"rejection) is sequenced into later PRs.Goal & strategy
The draft removes the TensorRT backend wholesale (~1,700 files, +31k/−334k) in three
commits. That is far too large for one review, so we split it into a chain of small
PRs, each of which leaves
mainin a buildable, importable, test-collectable state:never break a source build, so they de-risk the rest and shrink the surface first.
Important: the draft was rebased onto current
mainThe draft branch is based on a
mainthat is now ~165 commits old. A naivegit diff main..draftis therefore contaminated — it shows recently-added PyTorchtests (deepseek_v4, qwen_image, etc.) as "deletions" that have nothing to do with
TensorRT. All file selection here was done against the draft's merge-base, then
replayed onto current
main, and the test-list cleanup was re-derived againstcurrent
main(the draft's list edits no longer apply).Commit1 — Remove TensorRT examples and docs
206 files: 194 deleted, 9 modified, 2 renamed, 1 added.What & why
examples/convert_checkpoint.py→trtllm-build→ enginerun.py):examples/models/core/*andcontrib/*,eagle/,medusa/,redrafter/,lookahead/,draft_target_model/,examples/llm-api/_tensorrt_engine/. PyTorch loads HF checkpoints directly — no convert/build step.docs/source/legacy/(+commands/trtllm-build.rst)docs/source/legacy/was the repo's pre-existing TensorRT-era doc tree (TRT python-api autodoc, gpt-attention plugin, graph-rewriting, checkpoint format, build workflow, etc.). TRT-only or already duplicated in the currentfeatures/,torch/,models/,developer-guide/docs.docs/source/legacy/tensorrt-backend-removal.mdbackend="tensorrt"→ValueError, removedTrtLlmArgs/trtllm-build/trtllm-refit/trtllm-prune, droppedtensorrtpip dep, and how to migrate.index.rst,release-notes.md,torch/arch_overview.md,torch/kv_cache_manager.md, 2 blogs)PyTorch content preserved (review corrections — see §"Review")
examples/models/core/phi/phi4-mm.mdquickstart_multimodal.py,--backend pytorch)examples/models/core/{qwen,gemma,nemotron_nas}/README.mdlegacy/advanced/lowprecision-pcie-allreduce.md→docs/source/features/lowprecision-allreduce.md(+ its image)LOWPRECISIONAllReduce strategy that is live in the PyTorch backend (_torch/model_config.py,_torch/distributed/ops.py,llmapi/llm_args.py) and was undocumented elsewhere. Re-framed forLlmArgs.allreduce_strategy, added to the Features toctree.Commit2 — Remove TensorRT-backend tests and Triton TRT-engine paths
197 files: 184 deleted, 12 modified, 1 added.Tests deleted (145)
All target source that the TensorRT removal deletes; no PyTorch/shared coverage is lost.
tests/unittest/trt/**(functional, attention, quantization, model, model_api, python_plugin)tensorrt_llm.functional/network/builder, TRT plugins, TRTPretrainedModelengine buildstests/unittest/others/{test_builder,test_module,test_plugins,test_session,test_layer,test_graph_rewriter,test_kv_cache_manager,test_model_dtype,test_debugging_api,test_precision_control}.pyimport tensorrt,tensorrt_llm.Builder/module/plugin/graph_rewriting,runtime.{Session,kv_cache_manager}tests/integration/defs/triton_server/{build_engines,test_triton_llm,test_triton_memleak,test_triton_multi_node,test_triton_rcca}.pyconvert_checkpoint.py+trtllm-build)llmapi/{test_build_cache,test_llm_models,test_llm_multi_gpu}.py_tensorrt_engine.LLM,BuildConfigengine buildstools/plugin_gen/{test_core,test_plugin_gen,test_shape_infer}.pytest_model_runner_cpp.py,utils/test_medusa_utils.py, 2_torch/.../tests_lora_modules/*_vs_trt*.pyModelRunnerCppover a TRT engine; TRT medusa utils; LoRA tests whose reference is a built TRT engine (PyTorch LoRA stays covered bytests/unittest/_torch/lora/+ surviving sanity tests)Triton backend — only the C++ TRT-engine backend removed (39 of 114 files)
The
triton_backend/folder is not removed wholesale.inflight_batcher_llm/{src,client,cmake,scripts,tests},CMakeLists.txtlibtriton_tensorrtllm.so) that loads a serialized.engineand runs it via the C++ runtime (GptManager/Executor). It links the C++ TRT-engine runtime that the C++ decouple deletes — can no longer compile or run.all_models/inflight_batcher_llm/tensorrt_llm/{config.pbtxt,1/model.py}ci/L0_backend_trtllm/*,ci/README.mdconvert_checkpoint.py/trtllm-build) and tested the C++ backendKept (75 files): the Python
llmapiTriton backend (all_models/llmapi/tensorrt_llm,driven by the LLM API / PyExecutor — the PyTorch Triton path),
tools/,scripts/launch_triton_server.py, and the shared model templates.Test-lists & new test (12 M, 1 A)
test_lists/test-db/l0_*.yml,waives.txt,qa/llm_triton_integration.txt,triton_server/test_triton.pymainby checking each referenced path against the filesystem, so no surviving test was touched).llm_triton_integration.txtnow lists the 6test_triton_llmapi.pycases.triton_server/test_triton_llmapi.pyllmapibackend (test_llmapi_backend,test_llmapi_lora,test_llmapi_backend_multi_instance). Carries an NVIDIA header + module docstring. Without it the surviving Triton serving path would have zero integration coverage.Audit method (how "TRT-only" was decided)
Every deleted file was classified TRT-ONLY / KEEP-PYTORCH / MIXED by reading its
original content from
mainand checking concrete signals — imports (import tensorrt,_tensorrt_engine,runtime.Session,functional.*),--backend/backend=values,convert_checkpoint.py/trtllm-buildusage, and references totensorrt_llm/_torch/.This was run across all deleted examples (153), tests (145), and legacy docs (47).
The review (below) is what caught the initial over-deletion.
Review corrections
The first cut deleted at the directory level and removed PyTorch content. The audit
corrected it:
phi4-mm.md; trimmedqwen/gemma/nemotron_nasREADMEs toPyTorch-only; dropped
llama(covered by its deployment guide). Other 117 scripts +31 READMEs confirmed TRT-only.
lowprecision-pcie-allreduce.mddocumented live PyTorch functionalityabsent elsewhere → relocated. The other 45 are TRT-only or duplicated.
Validation
tensorrt_llm/**orcpp/**.referenced path no longer exists on disk.
test_triton_llmapi.pycollects (itshelpers
prepare_llmapi_model_repo/set_llmapi_decoupled_modeexist incommon.py).Known caveats / follow-ups
virtualenvs); commits used
--no-verify. Run hooks before pushing.all_models/inflight_batcher_llm/{ensemble,tensorrt_llm_bls,pre/postprocessing}previously chained into the now-deleted
tensorrt_llmC++ model, so that ensemble'stensorrt_llmstep is orphaned. Left for a follow-up that rewires the BLS to thellmapibackend (out of scope for this TRT-engine-removal PR).delete TRT Python modules + reject
backend="tensorrt", and decouple the C++ treefrom
nvinfer1.@coderabbitai summary
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either
api-compatibleorapi-breaking. Forapi-breaking, includeBREAKINGin the PR title.Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.