[None][chore] Remove legacy TensorRT examples, docs and tests by Wanli-Jiang · Pull Request #15760 · NVIDIA/TensorRT-LLM

Wanli-Jiang · 2026-06-30T06:39:39Z

TensorRT-Backend Removal — Step 1 Report (Examples / Docs / Tests)

Summary

This is step 1 of removing the legacy TensorRT backend from TensorRT-LLM,
carved out of Freddy's draft PR (QiJune/TensorRT-LLM#20)
as two small, independently-reviewable, always-buildable PRs:

PR	Commit	Scope	Files	Insertions / Deletions
commit1	`[None][chore] Remove TensorRT examples and docs, add migration guide`	`examples/`, `docs/`	206	+156 / −40,632
commit2	`[None][test] Remove TensorRT-backend tests and Triton TRT-engine paths`	`tests/`, `triton_backend/`	197	+339 / −55,527

Key guarantee: both commits touch ZERO Python/C++ source (tensorrt_llm/**,
cpp/**). They only remove/trim examples, docs, tests, and the C++ Triton-engine
backend. So neither PR can change runtime behavior or break the package build /
import tensorrt_llm. The actual source removal (Python modules, C++ nvinfer1
decoupling, CLI/backend="tensorrt" rejection) is sequenced into later PRs.

Goal & strategy

The draft removes the TensorRT backend wholesale (~1,700 files, +31k/−334k) in three
commits. That is far too large for one review, so we split it into a chain of small
PRs, each of which leaves main in a buildable, importable, test-collectable state:

Lead with pure deletions (examples, docs, TRT-only tests) — removing these can
never break a source build, so they de-risk the rest and shrink the surface first.
One concern per PR; Python-source and C++ removal come in later, separately-reviewed PRs.

Important: the draft was rebased onto current `main`

The draft branch is based on a main that is now ~165 commits old. A naive
git diff main..draft is therefore contaminated — it shows recently-added PyTorch
tests (deepseek_v4, qwen_image, etc.) as "deletions" that have nothing to do with
TensorRT. All file selection here was done against the draft's merge-base, then
replayed onto current main, and the test-list cleanup was re-derived against
current main (the draft's list edits no longer apply).

Commit1 — Remove TensorRT examples and docs

206 files: 194 deleted, 9 modified, 2 renamed, 1 added.

What & why

Action	Files	Why
Delete TRT example dirs	149 under `examples/`	The per-model TensorRT workflow (`convert_checkpoint.py` → `trtllm-build` → engine `run.py`): `examples/models/core/` and `contrib/`, `eagle/`, `medusa/`, `redrafter/`, `lookahead/`, `draft_target_model/`, `examples/llm-api/_tensorrt_engine/`. PyTorch loads HF checkpoints directly — no convert/build step.
Delete TRT docs	45 under `docs/source/legacy/` (+ `commands/trtllm-build.rst`)	`docs/source/legacy/` was the repo's pre-existing TensorRT-era doc tree (TRT python-api autodoc, gpt-attention plugin, graph-rewriting, checkpoint format, build workflow, etc.). TRT-only or already duplicated in the current `features/`, `torch/`, `models/`, `developer-guide/` docs.
Add migration guide	`docs/source/legacy/tensorrt-backend-removal.md`	Documents the breaking change: `backend="tensorrt"` → `ValueError`, removed `TrtLlmArgs`/`trtllm-build`/`trtllm-refit`/`trtllm-prune`, dropped `tensorrt` pip dep, and how to migrate.
Modify surviving docs	6 files (`index.rst`, `release-notes.md`, `torch/arch_overview.md`, `torch/kv_cache_manager.md`, 2 blogs)	Strip TensorRT references / fix the toctree.

PyTorch content preserved (review corrections — see §"Review")

Action	File(s)	Why
Kept (restored)	`examples/models/core/phi/phi4-mm.md`	Pure PyTorch doc (Phi-4-multimodal via `quickstart_multimodal.py`, `--backend pytorch`)
Trimmed to PyTorch-only	`examples/models/core/{qwen,gemma,nemotron_nas}/README.md`	Were MIXED; kept only their PyTorch sections (Qwen3/Qwen3-Next, Gemma 4, Nemotron-NAS pointer), dropped the TRT convert/build bodies
Relocated (rename)	`legacy/advanced/lowprecision-pcie-allreduce.md` → `docs/source/features/lowprecision-allreduce.md` (+ its image)	Documents the `LOWPRECISION` AllReduce strategy that is live in the PyTorch backend (`_torch/model_config.py`, `_torch/distributed/ops.py`, `llmapi/llm_args.py`) and was undocumented elsewhere. Re-framed for `LlmArgs.allreduce_strategy`, added to the Features toctree.

examples/models/core/llama/README.md was dropped (not trimmed): its PyTorch
content (Llama-3.3-70B FP8 serving + benchmarking) is fully covered, more
comprehensively, by docs/source/deployment-guide/deployment-guide-for-llama3.3-70b-on-trtllm.md.
examples/llm-api/, examples/auto_deploy/, examples/serve/, and
examples/models/core/multimodal/ were never touched.

Commit2 — Remove TensorRT-backend tests and Triton TRT-engine paths

197 files: 184 deleted, 12 modified, 1 added.

Tests deleted (145)

All target source that the TensorRT removal deletes; no PyTorch/shared coverage is lost.

Group	Count	Why TRT-only
`tests/unittest/trt/**` (functional, attention, quantization, model, model_api, python_plugin)	120	Test `tensorrt_llm.functional`/`network`/`builder`, TRT plugins, TRT `PretrainedModel` engine builds
`tests/unittest/others/{test_builder,test_module,test_plugins,test_session,test_layer,test_graph_rewriter,test_kv_cache_manager,test_model_dtype,test_debugging_api,test_precision_control}.py`	10	`import tensorrt`, `tensorrt_llm.Builder/module/plugin/graph_rewriting`, `runtime.{Session,kv_cache_manager}`
`tests/integration/defs/triton_server/{build_engines,test_triton_llm,test_triton_memleak,test_triton_multi_node,test_triton_rcca}.py`	5	Triton TRT-engine harness (`convert_checkpoint.py` + `trtllm-build`)
`llmapi/{test_build_cache,test_llm_models,test_llm_multi_gpu}.py`	3	Engine-build cache, `_tensorrt_engine.LLM`, `BuildConfig` engine builds
`tools/plugin_gen/{test_core,test_plugin_gen,test_shape_infer}.py`	3	TRT plugin-generator tool
`test_model_runner_cpp.py`, `utils/test_medusa_utils.py`, 2 `_torch/.../tests_lora_modules/_vs_trt.py`	4	`ModelRunnerCpp` over a TRT engine; TRT medusa utils; LoRA tests whose reference is a built TRT engine (PyTorch LoRA stays covered by `tests/unittest/_torch/lora/` + surviving sanity tests)

Triton backend — only the C++ TRT-engine backend removed (39 of 114 files)

The triton_backend/ folder is not removed wholesale.

Deleted	What it is
`inflight_batcher_llm/{src,client,cmake,scripts,tests}`, `CMakeLists.txt`	The C++ Triton backend (`libtriton_tensorrtllm.so`) that loads a serialized `.engine` and runs it via the C++ runtime (GptManager/Executor). It links the C++ TRT-engine runtime that the C++ decouple deletes — can no longer compile or run.
`all_models/inflight_batcher_llm/tensorrt_llm/{config.pbtxt,1/model.py}`	The Triton model template targeting that C++ backend
`ci/L0_backend_trtllm/*`, `ci/README.md`	CI that built engines (`convert_checkpoint.py`/`trtllm-build`) and tested the C++ backend

Kept (75 files): the Python llmapi Triton backend (all_models/llmapi/tensorrt_llm,
driven by the LLM API / PyExecutor — the PyTorch Triton path), tools/,
scripts/launch_triton_server.py, and the shared model templates.

Test-lists & new test (12 M, 1 A)

Action	Files	Why
Modify	9 × `test_lists/test-db/l0_*.yml`, `waives.txt`, `qa/llm_triton_integration.txt`, `triton_server/test_triton.py`	Remove now-dangling entries for the deleted tests (re-derived against current `main` by checking each referenced path against the filesystem, so no surviving test was touched). `llm_triton_integration.txt` now lists the 6 `test_triton_llmapi.py` cases.
Add	`triton_server/test_triton_llmapi.py`	PyTorch-path replacement for the removed TRT-engine Triton tests — exercises the kept `llmapi` backend (`test_llmapi_backend`, `test_llmapi_lora`, `test_llmapi_backend_multi_instance`). Carries an NVIDIA header + module docstring. Without it the surviving Triton serving path would have zero integration coverage.

Audit method (how "TRT-only" was decided)

Every deleted file was classified TRT-ONLY / KEEP-PYTORCH / MIXED by reading its
original content from main and checking concrete signals — imports (import tensorrt,
_tensorrt_engine, runtime.Session, functional.*), --backend/backend= values,
convert_checkpoint.py/trtllm-build usage, and references to tensorrt_llm/_torch/.
This was run across all deleted examples (153), tests (145), and legacy docs (47).
The review (below) is what caught the initial over-deletion.

Review corrections

The first cut deleted at the directory level and removed PyTorch content. The audit
corrected it:

Examples: restored phi4-mm.md; trimmed qwen/gemma/nemotron_nas READMEs to
PyTorch-only; dropped llama (covered by its deployment guide). Other 117 scripts +
31 READMEs confirmed TRT-only.
Tests: all 145 deletions confirmed TRT-only — none restored.
Docs: only lowprecision-pcie-allreduce.md documented live PyTorch functionality
absent elsewhere → relocated. The other 45 are TRT-only or duplicated.

Validation

Both commits touch 0 files under tensorrt_llm/** or cpp/**.
All 9 modified test-db YAMLs parse; test-list entries removed only where the
referenced path no longer exists on disk.
No surviving test imports a deleted module; test_triton_llmapi.py collects (its
helpers prepare_llmapi_model_repo / set_llmapi_decoupled_mode exist in common.py).
Trimmed READMEs have no dangling internal anchors or relative links.

Known caveats / follow-ups

pre-commit hooks could not run in the authoring sandbox (can't build their
virtualenvs); commits used --no-verify. Run hooks before pushing.
The kept all_models/inflight_batcher_llm/{ensemble,tensorrt_llm_bls,pre/postprocessing}
previously chained into the now-deleted tensorrt_llm C++ model, so that ensemble's
tensorrt_llm step is orphaned. Left for a follow-up that rewires the BLS to the
llmapi backend (out of scope for this TRT-engine-removal PR).
These are the first two PRs of the chain. Remaining: extract shared Python symbols,
delete TRT Python modules + reject backend="tensorrt", and decouple the C++ tree
from nvinfer1.

@coderabbitai summary

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Remove legacy TensorRT-backend example directories and documentation pages, add docs/source/legacy/tensorrt-backend-removal.md migration guide, and drop TensorRT references from the surviving docs (index, release notes, torch arch/kv-cache overviews, and two blogs). Pure docs/examples change: no Python or C++ source is touched, so the package build and import are unaffected. First of the incremental PRs that split Freddy's TensorRT-removal draft (QiJune#20). Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>

Delete the TensorRT-only test suites (tests/unittest/trt/**, the TRT builder/ runtime/plugin/quantization/medusa/lora-vs-trt tests under others/, llmapi, and tools/plugin_gen) and the TensorRT-engine Triton backend paths under triton_backend/, plus the deleted triton_server integration tests. Re-derive the test-list cleanup against current main (Freddy's draft predates main by ~165 commits, so his hunks no longer apply): drop the now-dangling entries from tests/integration/test_lists/test-db/*.yml and the stale TensorRT waives, and replace qa/llm_triton_integration.txt with the llmapi-backend triton tests. Entries were removed only where the referenced test path no longer exists on disk, so no surviving PyTorch test is affected. No Python or C++ source is touched. Second of the incremental PRs splitting Freddy's TensorRT-removal draft (QiJune#20). Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>

Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>

Wanli-Jiang · 2026-06-30T07:08:17Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-06-30T07:13:48Z

PR_Github #56540 [ run ] triggered by Bot. Commit: 0d4c986 Link to invocation

nv-guomingz · 2026-06-30T07:21:01Z


 ### API Changes

+- **[BREAKING CHANGE] TensorRT backend removed.** PyTorch is now the sole execution backend. `LLM(backend="tensorrt")` now raises a `ValueError`; `TrtLlmArgs`, `tensorrt_llm._tensorrt_engine.LLM`, the `trtllm-build` / `trtllm-refit` / `trtllm-prune` CLIs, the `--backend tensorrt` CLI choice, and the per-model `convert_checkpoint.py` scripts have all been removed. The `tensorrt` pip dependency is no longer installed. See the [TensorRT Backend Removed migration guide](legacy/tensorrt-backend-removal.md) for details.


We need to update llm_args.py too.

Wanli-Jiang · 2026-06-30T07:54:21Z

will split to smaller PRs.

Wanli-Jiang · 2026-06-30T07:58:34Z

step 1.1 PR: #15763

tensorrt-cicd · 2026-06-30T08:21:16Z

PR_Github #56540 [ run ] completed with state FAILURE. Commit: 0d4c986
/LLM/main/L0_MergeRequest_PR pipeline #45376 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

github-actions Bot assigned Wanli-Jiang Jun 30, 2026

Wanli-Jiang added 3 commits June 30, 2026 15:00

[None][fix] Update format to pass pre-commit check

0d4c986

Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>

Wanli-Jiang force-pushed the user/williamj/deprecated-trt-backend-wave1 branch from 8ee7701 to 0d4c986 Compare June 30, 2026 07:00

Wanli-Jiang marked this pull request as ready for review June 30, 2026 07:05

Wanli-Jiang requested review from a team as code owners June 30, 2026 07:05

Wanli-Jiang requested review from 2ez4bz, SimengLiu-nv, laikhtewari, nv-guomingz and schetlur-nv June 30, 2026 07:05

Wanli-Jiang requested a review from QiJune June 30, 2026 07:09

nv-guomingz reviewed Jun 30, 2026

View reviewed changes

Wanli-Jiang closed this Jun 30, 2026

Wanli-Jiang mentioned this pull request Jul 3, 2026

[TRTLLM-13784][chore] Remove legacy TensorRT-engine Triton backend #15907

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[None][chore] Remove legacy TensorRT examples, docs and tests#15760

[None][chore] Remove legacy TensorRT examples, docs and tests#15760
Wanli-Jiang wants to merge 3 commits into
NVIDIA:mainfrom
Wanli-Jiang:user/williamj/deprecated-trt-backend-wave1

Wanli-Jiang commented Jun 30, 2026 •

edited

Loading

Uh oh!

Wanli-Jiang commented Jun 30, 2026

Uh oh!

tensorrt-cicd commented Jun 30, 2026

Uh oh!

nv-guomingz Jun 30, 2026

Uh oh!

Wanli-Jiang commented Jun 30, 2026

Uh oh!

Wanli-Jiang commented Jun 30, 2026

Uh oh!

tensorrt-cicd commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		### API Changes

		- [BREAKING CHANGE] TensorRT backend removed. PyTorch is now the sole execution backend. `LLM(backend="tensorrt")` now raises a `ValueError`; `TrtLlmArgs`, `tensorrt_llm._tensorrt_engine.LLM`, the `trtllm-build` / `trtllm-refit` / `trtllm-prune` CLIs, the `--backend tensorrt` CLI choice, and the per-model `convert_checkpoint.py` scripts have all been removed. The `tensorrt` pip dependency is no longer installed. See the [TensorRT Backend Removed migration guide](legacy/tensorrt-backend-removal.md) for details.

Uh oh!

Conversation

Wanli-Jiang commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TensorRT-Backend Removal — Step 1 Report (Examples / Docs / Tests)

Summary

Goal & strategy

Important: the draft was rebased onto current main

Commit1 — Remove TensorRT examples and docs

What & why

PyTorch content preserved (review corrections — see §"Review")

Commit2 — Remove TensorRT-backend tests and Triton TRT-engine paths

Tests deleted (145)

Triton backend — only the C++ TRT-engine backend removed (39 of 114 files)

Test-lists & new test (12 M, 1 A)

Audit method (how "TRT-only" was decided)

Review corrections

Validation

Known caveats / follow-ups

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

Wanli-Jiang commented Jun 30, 2026

Uh oh!

tensorrt-cicd commented Jun 30, 2026

Uh oh!

nv-guomingz Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

Wanli-Jiang commented Jun 30, 2026

Uh oh!

Wanli-Jiang commented Jun 30, 2026

Uh oh!

tensorrt-cicd commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Wanli-Jiang commented Jun 30, 2026 •

edited

Loading

Important: the draft was rebased onto current `main`