Skip to content

[None][chore] Remove legacy TensorRT examples, docs and tests#15760

Closed
Wanli-Jiang wants to merge 3 commits into
NVIDIA:mainfrom
Wanli-Jiang:user/williamj/deprecated-trt-backend-wave1
Closed

[None][chore] Remove legacy TensorRT examples, docs and tests#15760
Wanli-Jiang wants to merge 3 commits into
NVIDIA:mainfrom
Wanli-Jiang:user/williamj/deprecated-trt-backend-wave1

Conversation

@Wanli-Jiang

@Wanli-Jiang Wanli-Jiang commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator

TensorRT-Backend Removal — Step 1 Report (Examples / Docs / Tests)

Summary

This is step 1 of removing the legacy TensorRT backend from TensorRT-LLM,
carved out of Freddy's draft PR (QiJune/TensorRT-LLM#20)
as two small, independently-reviewable, always-buildable PRs:

PR Commit Scope Files Insertions / Deletions
commit1 [None][chore] Remove TensorRT examples and docs, add migration guide examples/, docs/ 206 +156 / −40,632
commit2 [None][test] Remove TensorRT-backend tests and Triton TRT-engine paths tests/, triton_backend/ 197 +339 / −55,527

Key guarantee: both commits touch ZERO Python/C++ source (tensorrt_llm/**,
cpp/**). They only remove/trim examples, docs, tests, and the C++ Triton-engine
backend. So neither PR can change runtime behavior or break the package build /
import tensorrt_llm. The actual source removal (Python modules, C++ nvinfer1
decoupling, CLI/backend="tensorrt" rejection) is sequenced into later PRs.


Goal & strategy

The draft removes the TensorRT backend wholesale (~1,700 files, +31k/−334k) in three
commits. That is far too large for one review, so we split it into a chain of small
PRs, each of which leaves main in a buildable, importable, test-collectable state:

  • Lead with pure deletions (examples, docs, TRT-only tests) — removing these can
    never break a source build, so they de-risk the rest and shrink the surface first.
  • One concern per PR; Python-source and C++ removal come in later, separately-reviewed PRs.

Important: the draft was rebased onto current main

The draft branch is based on a main that is now ~165 commits old. A naive
git diff main..draft is therefore contaminated — it shows recently-added PyTorch
tests (deepseek_v4, qwen_image, etc.) as "deletions" that have nothing to do with
TensorRT. All file selection here was done against the draft's merge-base, then
replayed onto current main
, and the test-list cleanup was re-derived against
current main (the draft's list edits no longer apply).


Commit1 — Remove TensorRT examples and docs

206 files: 194 deleted, 9 modified, 2 renamed, 1 added.

What & why

Action Files Why
Delete TRT example dirs 149 under examples/ The per-model TensorRT workflow (convert_checkpoint.pytrtllm-build → engine run.py): examples/models/core/* and contrib/*, eagle/, medusa/, redrafter/, lookahead/, draft_target_model/, examples/llm-api/_tensorrt_engine/. PyTorch loads HF checkpoints directly — no convert/build step.
Delete TRT docs 45 under docs/source/legacy/ (+ commands/trtllm-build.rst) docs/source/legacy/ was the repo's pre-existing TensorRT-era doc tree (TRT python-api autodoc, gpt-attention plugin, graph-rewriting, checkpoint format, build workflow, etc.). TRT-only or already duplicated in the current features/, torch/, models/, developer-guide/ docs.
Add migration guide docs/source/legacy/tensorrt-backend-removal.md Documents the breaking change: backend="tensorrt"ValueError, removed TrtLlmArgs/trtllm-build/trtllm-refit/trtllm-prune, dropped tensorrt pip dep, and how to migrate.
Modify surviving docs 6 files (index.rst, release-notes.md, torch/arch_overview.md, torch/kv_cache_manager.md, 2 blogs) Strip TensorRT references / fix the toctree.

PyTorch content preserved (review corrections — see §"Review")

Action File(s) Why
Kept (restored) examples/models/core/phi/phi4-mm.md Pure PyTorch doc (Phi-4-multimodal via quickstart_multimodal.py, --backend pytorch)
Trimmed to PyTorch-only examples/models/core/{qwen,gemma,nemotron_nas}/README.md Were MIXED; kept only their PyTorch sections (Qwen3/Qwen3-Next, Gemma 4, Nemotron-NAS pointer), dropped the TRT convert/build bodies
Relocated (rename) legacy/advanced/lowprecision-pcie-allreduce.mddocs/source/features/lowprecision-allreduce.md (+ its image) Documents the LOWPRECISION AllReduce strategy that is live in the PyTorch backend (_torch/model_config.py, _torch/distributed/ops.py, llmapi/llm_args.py) and was undocumented elsewhere. Re-framed for LlmArgs.allreduce_strategy, added to the Features toctree.

examples/models/core/llama/README.md was dropped (not trimmed): its PyTorch
content (Llama-3.3-70B FP8 serving + benchmarking) is fully covered, more
comprehensively, by docs/source/deployment-guide/deployment-guide-for-llama3.3-70b-on-trtllm.md.
examples/llm-api/, examples/auto_deploy/, examples/serve/, and
examples/models/core/multimodal/ were never touched.


Commit2 — Remove TensorRT-backend tests and Triton TRT-engine paths

197 files: 184 deleted, 12 modified, 1 added.

Tests deleted (145)

All target source that the TensorRT removal deletes; no PyTorch/shared coverage is lost.

Group Count Why TRT-only
tests/unittest/trt/** (functional, attention, quantization, model, model_api, python_plugin) 120 Test tensorrt_llm.functional/network/builder, TRT plugins, TRT PretrainedModel engine builds
tests/unittest/others/{test_builder,test_module,test_plugins,test_session,test_layer,test_graph_rewriter,test_kv_cache_manager,test_model_dtype,test_debugging_api,test_precision_control}.py 10 import tensorrt, tensorrt_llm.Builder/module/plugin/graph_rewriting, runtime.{Session,kv_cache_manager}
tests/integration/defs/triton_server/{build_engines,test_triton_llm,test_triton_memleak,test_triton_multi_node,test_triton_rcca}.py 5 Triton TRT-engine harness (convert_checkpoint.py + trtllm-build)
llmapi/{test_build_cache,test_llm_models,test_llm_multi_gpu}.py 3 Engine-build cache, _tensorrt_engine.LLM, BuildConfig engine builds
tools/plugin_gen/{test_core,test_plugin_gen,test_shape_infer}.py 3 TRT plugin-generator tool
test_model_runner_cpp.py, utils/test_medusa_utils.py, 2 _torch/.../tests_lora_modules/*_vs_trt*.py 4 ModelRunnerCpp over a TRT engine; TRT medusa utils; LoRA tests whose reference is a built TRT engine (PyTorch LoRA stays covered by tests/unittest/_torch/lora/ + surviving sanity tests)

Triton backend — only the C++ TRT-engine backend removed (39 of 114 files)

The triton_backend/ folder is not removed wholesale.

Deleted What it is
inflight_batcher_llm/{src,client,cmake,scripts,tests}, CMakeLists.txt The C++ Triton backend (libtriton_tensorrtllm.so) that loads a serialized .engine and runs it via the C++ runtime (GptManager/Executor). It links the C++ TRT-engine runtime that the C++ decouple deletes — can no longer compile or run.
all_models/inflight_batcher_llm/tensorrt_llm/{config.pbtxt,1/model.py} The Triton model template targeting that C++ backend
ci/L0_backend_trtllm/*, ci/README.md CI that built engines (convert_checkpoint.py/trtllm-build) and tested the C++ backend

Kept (75 files): the Python llmapi Triton backend (all_models/llmapi/tensorrt_llm,
driven by the LLM API / PyExecutor — the PyTorch Triton path), tools/,
scripts/launch_triton_server.py, and the shared model templates.

Test-lists & new test (12 M, 1 A)

Action Files Why
Modify 9 × test_lists/test-db/l0_*.yml, waives.txt, qa/llm_triton_integration.txt, triton_server/test_triton.py Remove now-dangling entries for the deleted tests (re-derived against current main by checking each referenced path against the filesystem, so no surviving test was touched). llm_triton_integration.txt now lists the 6 test_triton_llmapi.py cases.
Add triton_server/test_triton_llmapi.py PyTorch-path replacement for the removed TRT-engine Triton tests — exercises the kept llmapi backend (test_llmapi_backend, test_llmapi_lora, test_llmapi_backend_multi_instance). Carries an NVIDIA header + module docstring. Without it the surviving Triton serving path would have zero integration coverage.

Audit method (how "TRT-only" was decided)

Every deleted file was classified TRT-ONLY / KEEP-PYTORCH / MIXED by reading its
original content from main and checking concrete signals — imports (import tensorrt,
_tensorrt_engine, runtime.Session, functional.*), --backend/backend= values,
convert_checkpoint.py/trtllm-build usage, and references to tensorrt_llm/_torch/.
This was run across all deleted examples (153), tests (145), and legacy docs (47).
The review (below) is what caught the initial over-deletion.


Review corrections

The first cut deleted at the directory level and removed PyTorch content. The audit
corrected it:

  • Examples: restored phi4-mm.md; trimmed qwen/gemma/nemotron_nas READMEs to
    PyTorch-only; dropped llama (covered by its deployment guide). Other 117 scripts +
    31 READMEs confirmed TRT-only.
  • Tests: all 145 deletions confirmed TRT-only — none restored.
  • Docs: only lowprecision-pcie-allreduce.md documented live PyTorch functionality
    absent elsewhere → relocated. The other 45 are TRT-only or duplicated.

Validation

  • Both commits touch 0 files under tensorrt_llm/** or cpp/**.
  • All 9 modified test-db YAMLs parse; test-list entries removed only where the
    referenced path no longer exists on disk.
  • No surviving test imports a deleted module; test_triton_llmapi.py collects (its
    helpers prepare_llmapi_model_repo / set_llmapi_decoupled_mode exist in common.py).
  • Trimmed READMEs have no dangling internal anchors or relative links.

Known caveats / follow-ups

  • pre-commit hooks could not run in the authoring sandbox (can't build their
    virtualenvs); commits used --no-verify. Run hooks before pushing.
  • The kept all_models/inflight_batcher_llm/{ensemble,tensorrt_llm_bls,pre/postprocessing}
    previously chained into the now-deleted tensorrt_llm C++ model, so that ensemble's
    tensorrt_llm step is orphaned. Left for a follow-up that rewires the BLS to the
    llmapi backend (out of scope for this TRT-engine-removal PR).
  • These are the first two PRs of the chain. Remaining: extract shared Python symbols,
    delete TRT Python modules + reject backend="tensorrt", and decouple the C++ tree
    from nvinfer1.

@coderabbitai summary

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Remove legacy TensorRT-backend example directories and documentation pages,
add docs/source/legacy/tensorrt-backend-removal.md migration guide, and drop
TensorRT references from the surviving docs (index, release notes, torch
arch/kv-cache overviews, and two blogs).

Pure docs/examples change: no Python or C++ source is touched, so the package
build and import are unaffected. First of the incremental PRs that split
Freddy's TensorRT-removal draft (QiJune#20).

Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Delete the TensorRT-only test suites (tests/unittest/trt/**, the TRT builder/
runtime/plugin/quantization/medusa/lora-vs-trt tests under others/, llmapi, and
tools/plugin_gen) and the TensorRT-engine Triton backend paths under
triton_backend/, plus the deleted triton_server integration tests.

Re-derive the test-list cleanup against current main (Freddy's draft predates
main by ~165 commits, so his hunks no longer apply): drop the now-dangling
entries from tests/integration/test_lists/test-db/*.yml and the stale TensorRT
waives, and replace qa/llm_triton_integration.txt with the llmapi-backend
triton tests. Entries were removed only where the referenced test path no
longer exists on disk, so no surviving PyTorch test is affected.

No Python or C++ source is touched. Second of the incremental PRs splitting
Freddy's TensorRT-removal draft (QiJune#20).

Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
@Wanli-Jiang Wanli-Jiang force-pushed the user/williamj/deprecated-trt-backend-wave1 branch from 8ee7701 to 0d4c986 Compare June 30, 2026 07:00
@Wanli-Jiang Wanli-Jiang marked this pull request as ready for review June 30, 2026 07:05
@Wanli-Jiang Wanli-Jiang requested review from a team as code owners June 30, 2026 07:05
@Wanli-Jiang

Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@Wanli-Jiang Wanli-Jiang requested a review from QiJune June 30, 2026 07:09
@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #56540 [ run ] triggered by Bot. Commit: 0d4c986 Link to invocation


### API Changes

- **[BREAKING CHANGE] TensorRT backend removed.** PyTorch is now the sole execution backend. `LLM(backend="tensorrt")` now raises a `ValueError`; `TrtLlmArgs`, `tensorrt_llm._tensorrt_engine.LLM`, the `trtllm-build` / `trtllm-refit` / `trtllm-prune` CLIs, the `--backend tensorrt` CLI choice, and the per-model `convert_checkpoint.py` scripts have all been removed. The `tensorrt` pip dependency is no longer installed. See the [TensorRT Backend Removed migration guide](legacy/tensorrt-backend-removal.md) for details.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to update llm_args.py too.

@Wanli-Jiang

Copy link
Copy Markdown
Collaborator Author

will split to smaller PRs.

@Wanli-Jiang

Copy link
Copy Markdown
Collaborator Author

step 1.1 PR: #15763

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #56540 [ run ] completed with state FAILURE. Commit: 0d4c986
/LLM/main/L0_MergeRequest_PR pipeline #45376 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants