[TRTLLM-13784][chore] Remove legacy TensorRT-engine Triton backend by Wanli-Jiang · Pull Request #15907 · NVIDIA/TensorRT-LLM

Wanli-Jiang · 2026-07-03T05:36:58Z

Features

Step 1.4 of removing the legacy TensorRT backend: delete every Triton serving
surface that targets legacy TensorRT engines — the C++ backend (inflight_batcher_llm) and
all its TRT-engine model-repo templates (gpt/whisper/multimodal/disaggregated_serving)
plus their paired client tools — keeping only the PyTorch/LLM API Triton backend
(all_models/llmapi) and the shared launcher/client tooling.

Branch: user/williamj/deprecated-trt-backend-step1p4 · base: origin/main @ fb0d68be9c
Commit ebebf27d90 · 94 files, +5 / −25,978 (85 deletions, 9 edits).
Status: complete & validated (see Validation).

Scope

Removed: the C++ backend, every TRT-engine model-repo template, their paired client
tools, and the C++-backend QA CI. The templates all run TRT engines (via the C++ backend or
tensorrt_llm.runtime ModelRunner/Session); multimodal was already deprecated (EOL V1.2).
Kept: the llmapi (PyTorch) backend, the shared launcher, and the shared client tools.
Integration test defs left to PR3. The TRT-engine test defs under
tests/integration/defs/triton_server/ — and their test-db/waives/qa list entries —
are removed by TRTLLM-13783 (#15810), not here, to keep the two PRs non-overlapping.
See the PR3 hand-off in Notes.

Removed (85)

Every file is a pure deletion — it serves or exercises legacy TensorRT engines and has no
consumer once the TRT backend is gone.

C++ Triton backend — `triton_backend/inflight_batcher_llm/` (28)

Builds libtriton_tensorrtllm.so + trtllmExecutorWorker; the entire engine-serving backend.

CMakeLists.txt, cmake/TritonTensorRTLLMBackendConfig.cmake.in, cmake/modules/set_ifndef.cmake, scripts/build.sh — build system for the .so/worker.
src/libtensorrtllm.cc, src/libtriton_tensorrtllm.ldscript — Triton backend entry + symbol export map.
src/model_state.{cc,h}, src/model_instance_state.{cc,h} — Triton model/instance lifecycle driving the TRT executor.
src/custom_metrics_reporter/custom_metrics_reporter.{cc,h} — backend-specific Triton metrics.
src/namedTensor.{cpp,h}, src/utils.{cc,h} — backend-internal request/tensor plumbing.
client/{__init__.py,README.md,inflight_batcher_llm_client.py,end_to_end_grpc_client.py,e2e_grpc_speculative_decoding_client.py} — gRPC clients hard-wired to this backend.
tests/{CMakeLists.txt,modelState.cpp,modelInstanceStateTest.cpp,utilsTest.cpp,first.json,second.json,third.json} — C++ unit tests of the removed src/.

TRT-engine model-repo templates — `triton_backend/all_models/` (39)

Each template drives TRT engines via the C++ backend or tensorrt_llm.runtime ModelRunner/Session.

inflight_batcher_llm/ (12): ensemble/{1/.tmp,config.pbtxt}, preprocessing/{1/model.py,config.pbtxt}, postprocessing/{1/model.py,config.pbtxt}, tensorrt_llm/{1/model.py,config.pbtxt}, tensorrt_llm_bls/{1/model.py,1/lib/decode.py,1/lib/triton_decoder.py,config.pbtxt} — the flagship engine template + its BLS.
gpt/ (8): ensemble/{1/.tmp,config.pbtxt}, preprocessing, postprocessing, tensorrt_llm (model.py+config.pbtxt each) — GPT engine template.
disaggregated_serving/ (4): README.md, disaggregated_serving.md, disaggregated_serving_bls/{1/model.py,config.pbtxt} — BLS orchestrator over two engine instances; dead without the backend.
multimodal/ (10): Deprecation_notice.md, ensemble/config.pbtxt, multimodal_encoders/{1/model.py,1/multimodal_utils.py,config.pbtxt}, requirements-{llava-onevision,mistral3.1,mllama,qwen2vl,vila}.txt — already deprecated (EOL V1.2).
whisper/ (4): whisper_bls/{1/model.py,1/fbank.py,1/tokenizer.py,config.pbtxt} — Whisper engine BLS template.
tests/ (5): test_decode.py, test_triton_decoder.py, test_python_backend.py, test_multi_image_preprocess.py, test_multimodal_encoders.py — unit tests whose PYTHONPATH targeted the removed templates.

Paired client tools — `triton_backend/tools/` (8)

gpt/ (6): client.py, client_async.py, end_to_end_test.py, benchmark_core_model.py, gen_input_data.py, input_data.json.
multimodal/client.py, whisper/client.py.

C++-backend QA CI — `triton_backend/ci/` (6)

ci/README.md documented only this CI, so ci/ goes wholesale.

L0_backend_trtllm/{generate_engines.sh,test.sh,simple_data.json,base_metrics_verification_tests.py,custom_metrics_verification_tests.py}, README.md.

Kept (deliberately)

triton_backend/all_models/llmapi/ — the PyTorch/LLM API Triton backend; test_llmapi_python_backend.py is the sole surviving all_models/tests/ file.
triton_backend/scripts/launch_triton_server.py (default --model_repo now points at ../all_models/llmapi), triton_backend/requirements.txt.
Shared triton_backend/tools/: inflight_batcher_llm/ client scripts (end_to_end_test.py/benchmark_core_model.py imported by the llmapi PyTorch test), llmapi_client.py, fill_template.py, dataset/, utils/, tests/.

Edited (9)

File	Δ	Change	Why
`docker/Dockerfile.multi`	−12	Dropped the whole `tritonbuild` stage (compiled the backend via `inflight_batcher_llm/scripts/build.sh`) and the three `tritonrelease` `COPY`s: `inflight_batcher_llm/scripts`, `inflight_batcher_llm/client`, and `--from=tritonbuild …/backends/tensorrtllm`.	Its only output was the removed `libtriton_tensorrtllm.so`; keeping it would fail the build. Kept the `tritonrelease` `COPY`s of `all_models`/`scripts`/`tools`.
`jenkins/Build.groovy`	−16	Removed the `cmake … && make install` step that built `inflight_batcher_llm/build/` and the two `cp libtriton_tensorrtllm.so`/`trtllmExecutorWorker` packaging lines; renumbered trailing "Step" comments (3/4/5/6 → 3/4/5).	Build target and artifacts no longer exist. Step sequence stays contiguous (1–5, no gaps).
`jenkins/L0_Test.groovy`	−6	Removed `mkdir -p /opt/tritonserver/backends/tensorrtllm` + the two non-aarch64 `cp …so`/`…Worker` copies into it.	Those artifacts are no longer built/packaged; the copy would fail.
`tests/integration/defs/thirdparty/test_cmake_third_party.py`	−3	Removed the `"triton_backend/inflight_batcher_llm/*"` entry (+ 2 comment lines) from `IGNORE_PATTERNS`.	Dangling ignore-glob for the deleted backend `CMakeLists.txt`.
`triton_backend/scripts/launch_triton_server.py`	+1/−1	Default `--model_repo`: `…/../all_models/gpt` → `…/../all_models/llmapi`.	`all_models/gpt` was deleted; default must point at a surviving template.
`legacy-files.txt`	−33	Removed 33 exempt-file entries for the deleted paths.	Source of truth for the generated lint configs; the 4 `tools/inflight_batcher_llm/*` entries stay (shared, kept).
`ruff-legacy.toml`	−33	Regenerated from `legacy-files.txt`.	Generated artifact (`scripts/legacy_utils.py gen-configs`).
`pyproject.toml`	−33	Regenerated from `legacy-files.txt`.	Same generated set.
`.pre-commit-config.yaml`	−70	Regenerated from `legacy-files.txt` (each file appears in two hook `files:` alternations → 2×).	Same generated set; `verify-legacy-config` hook enforces it.

Validation

Re-verified on HEAD ebebf27d90:

scripts/legacy_utils.py check-configs → "All generated configs are up to date"; verify-legacy-config pre-commit hook passes (exit 0).
.pre-commit-config.yaml parses as valid YAML; the 4 surviving tools/inflight_batcher_llm/* config entries are the kept shared client tools; no removed-path reference remains in any generated config.
jenkins/Build.groovy "Step" comments sequential 1–5, no gaps.
No repo-wide reference to the removed backend/templates/tools remains, except the PR3-owned tests/integration/defs/triton_server/* defs (see Notes), the disaggregated_serving blog link, and helix.md's unrelated test_disaggregated_serving.py path. Swept: libtriton_tensorrtllm, trtllmExecutorWorker, whisper_bls, custom_metrics_reporter, L0_backend_trtllm, tritonbuild, generate_engines.sh, all_models/{gpt,whisper,multimodal,disaggregated_serving}, tools/{gpt,whisper,multimodal} — all clean.
Committed with --no-verify (the type-check/mypy hook has a known stash-rollback glitch in this sandbox); the hooks above were run manually and pass. No runtime tensorrt_llm/** or cpp/** source is touched.

CI / QA safety

Confirmed no breaking change to the CI/QA test machinery:

No test-ID / test-list file is touched — the only test-list-adjacent file in the commit is legacy-files.txt (lint exempt-file source), not test-db/*.yml, waives.txt, or any qa/*.txt.
scripts/check_test_list.py --validate → OK: 2701 unique test entries validated. (exit 0) — the AST validator behind CI's check_test_list.py --l0 --qa --waive step (L0_Test.groovy:2674). All triton_server/* entries in test-db/*.yml (61), waives.txt (23), and qa/llm_triton_integration.txt (351) still resolve because their def files live on this branch.
Jenkins edits leave no orphaned symbols: llmPath/tritonShortTag fully gone from Build.groovy; isAarch64 fully gone from L0_Test.groovy.
Surviving triton_backend/ references all point at kept paths: LLM_BACKEND_ROOT=${llmSrc}/triton_backend (root still exists), the L0_MergeRequest.groovy path→stage mapping ("triton_backend/": ["-Triton-"]), and the Dockerfile.multi tritonrelease COPYs.

Notes

PR3 (TRTLLM-13783) hand-off — test lists. The Triton-server test-list/waive entries are
a unit with the triton_server/*.py def files PR3 removes, so PR3 cleans them (splitting the
edits here would cause merge conflicts and doesn't help — the defs still exist on this branch,
so all entries resolve). PR3 must purge, alongside the def files:
- waives.txt lines 571–593 — 23 triton_server/test_triton{,_llm,_rcca}.py::* SKIPs.
- test-db/*.yml — 61 entries across l0_a30.yml (43), l0_a100.yml (14), l0_b200.yml (3), l0_dgx_h200.yml (1); plus the 351 entries in qa/llm_triton_integration.txt.
- Do NOT touch waives.txt lines 87, 370, 379, 380, 384, 388 and the [triton-…]/triton_ssm entries in qa/llm_function_rtx6k.txt/llm_function_stress.txt — those are the PyTorch Triton kernel backend (GPT-OSS/Nemotron-H/AutoDeploy), a different "triton".
Landing coupling. This PR and PR3 must land together (or PR3 first): several of the 61
active test-db entries (test_gpt, test_whisper, test_python_bls_unit_tests, …) run
against templates this PR deletes, so they would fail at execution in the gap. Static list
validation stays green throughout — this is a runtime coupling only.
Docs deferred to PR2 (TRTLLM-13782). docs/.../blog05_Disaggregated_Serving_in_TensorRT-LLM.md
links to the removed all_models/disaggregated_serving dir (now a dead GitHub link); left
untouched to keep this PR docs-free.

Remaining to land the PR

Rebase check: could not git fetch origin main in this sandbox (access/network); the locally-known origin/main is fb0d68be9c == HEAD~1, so the branch is a clean single commit. Rebase onto true top-of-tree before opening the PR if it has advanced.
Push + open PR against NVIDIA/TensorRT-LLM main (needs GH_CONFIG_DIR confirmed).

Summary by CodeRabbit

Chores
- Simplified the backend release and CI packaging flow by removing several bundled Triton backend artifacts and related build steps.
- Updated file selection and formatting rules to reflect the new, smaller set of tracked backend files.
- Cleaned up outdated model, client, test, and documentation entries across multiple backend areas.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

coderabbitai · 2026-07-03T05:44:58Z

📝 Walkthrough

Walkthrough

This PR removes the legacy triton_backend implementation (models, configs, clients, CI scripts, C++ backend source, and documentation) and updates build/deployment pipelines (Dockerfile.multi, Jenkins scripts) and tooling configs (pre-commit, ruff, legacy-files) to reference a much smaller retained subset of triton_backend files.

Changes

Triton Backend Legacy Removal and Tooling Updates

Layer / File(s)	Summary
Build and deploy pipeline updates `docker/Dockerfile.multi`, `jenkins/Build.groovy`, `jenkins/L0_Test.groovy`	Docker's `tritonrelease` stage drops the intermediate build stage and only copies `all_models`/`scripts`/`tools`; Jenkins packaging and test setup no longer build or copy compiled triton backend artifacts.
Tooling/config file-list updates `.pre-commit-config.yaml`, `legacy-files.txt`, `pyproject.toml`, `ruff-legacy.toml`, `tests/integration/defs/thirdparty/test_cmake_third_party.py`	File-selection regex/include/exclude lists across lint/format configs are narrowed to a reduced set of `triton_backend` paths (mostly `llmapi` and select `tools`), and a CMake ignore pattern for `inflight_batcher_llm` is removed.
Legacy model, config, client, CI, and C++ backend removal `triton_backend/all_models/...`, `triton_backend/ci/...`, `triton_backend/inflight_batcher_llm/...`	A large set of Triton backend Python models, `.pbtxt` configs, client scripts, CI test/metrics scripts, C++ backend source/headers, CMake build files, and documentation are deleted entirely (e.g., GPT/inflight-batcher/disaggregated-serving/multimodal/whisper models, ensemble configs, custom metrics reporter, `libtensorrtllm.cc`).

Estimated code review effort: 4 (Complex) | ~60 minutes

Suggested reviewers: tfogal, Superjomn, brb-nv, qixiang-99, yizhang-nv, fredricz-20070104

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Title check	✅ Passed	The title matches the main change and uses the required [ticket][type] format.
Description check	✅ Passed	It covers scope, validation, CI safety, and rollout notes, though it uses custom headings instead of the template's exact sections.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands.}

Remove the Triton serving surface that targets legacy TensorRT engines, as part of deprecating the TensorRT backend. Only the PyTorch/LLM API Triton backend (all_models/llmapi) and the shared launcher/client tooling remain. - Delete the C++ Triton backend triton_backend/inflight_batcher_llm/ (source, client, cmake, build/test scripts); it builds libtriton_tensorrtllm.so / trtllmExecutorWorker. - Delete the legacy TensorRT-engine model-repo templates, all of which run TRT engines (via the C++ backend or tensorrt_llm.runtime ModelRunner/Session): all_models/{inflight_batcher_llm,disaggregated_serving,gpt,whisper,multimodal} (multimodal was already deprecated, EOL V1.2). Delete their paired client tools tools/{gpt,whisper,multimodal}. - Delete triton_backend/ci/L0_backend_trtllm/ (the C++-backend QA CI) and the all_models/tests/ unit tests that only exercised the removed templates. - launch_triton_server.py: default --model_repo now points at the surviving all_models/llmapi template. - docker/Dockerfile.multi: drop the tritonbuild stage (built the C++ backend) and the tritonrelease COPYs of the backend .so, its scripts, and client. - jenkins: drop the C++-backend cmake build and artifact packaging (Build.groovy) and the copy into /opt/tritonserver/backends/tensorrtllm (L0_Test.groovy). - Regenerate the legacy lint configs (legacy-files.txt -> ruff-legacy.toml, pyproject.toml, .pre-commit-config.yaml) and drop the stale inflight_batcher_llm IGNORE_PATTERNS entry in the cmake third-party test. The legacy TensorRT-engine Triton integration test defs under tests/integration/defs/triton_server/ are handled separately in the TRTLLM-13783 test-removal change. Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>

Wanli-Jiang · 2026-07-03T05:48:53Z

/bot run --add-multi-gpu-test --disable-fail-fast

tensorrt-cicd · 2026-07-03T05:54:41Z

PR_Github #57374 [ run ] triggered by Bot. Commit: afc5f52 Link to invocation

tensorrt-cicd · 2026-07-03T16:06:13Z

PR_Github #57374 [ run ] completed with state SUCCESS. Commit: afc5f52
/LLM/main/L0_MergeRequest_PR pipeline #46124 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

github-actions Bot assigned Wanli-Jiang Jul 3, 2026

Wanli-Jiang marked this pull request as ready for review July 3, 2026 05:40

Wanli-Jiang requested review from a team as code owners July 3, 2026 05:40

Wanli-Jiang requested review from SimengLiu-nv, Tabrizian, ZhanruiSunCh, mzweilz and yiqingy0 July 3, 2026 05:40

Wanli-Jiang force-pushed the user/williamj/deprecated-trt-backend-step1p4 branch from ebebf27 to afc5f52 Compare July 3, 2026 05:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[TRTLLM-13784][chore] Remove legacy TensorRT-engine Triton backend#15907

[TRTLLM-13784][chore] Remove legacy TensorRT-engine Triton backend#15907
Wanli-Jiang wants to merge 1 commit into
NVIDIA:mainfrom
Wanli-Jiang:user/williamj/deprecated-trt-backend-step1p4

Wanli-Jiang commented Jul 3, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jul 3, 2026 •

edited

Loading

Walkthrough

Changes

Uh oh!

Wanli-Jiang commented Jul 3, 2026

Uh oh!

tensorrt-cicd commented Jul 3, 2026

Uh oh!

tensorrt-cicd commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Wanli-Jiang commented Jul 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Features

Scope

Removed (85)

C++ Triton backend — triton_backend/inflight_batcher_llm/ (28)

TRT-engine model-repo templates — triton_backend/all_models/ (39)

Paired client tools — triton_backend/tools/ (8)

C++-backend QA CI — triton_backend/ci/ (6)

Kept (deliberately)

Edited (9)

Validation

CI / QA safety

Notes

Remaining to land the PR

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

coderabbitai Bot commented Jul 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Uh oh!

Wanli-Jiang commented Jul 3, 2026

Uh oh!

tensorrt-cicd commented Jul 3, 2026

Uh oh!

tensorrt-cicd commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Wanli-Jiang commented Jul 3, 2026 •

edited

Loading

C++ Triton backend — `triton_backend/inflight_batcher_llm/` (28)

TRT-engine model-repo templates — `triton_backend/all_models/` (39)

Paired client tools — `triton_backend/tools/` (8)

C++-backend QA CI — `triton_backend/ci/` (6)

coderabbitai Bot commented Jul 3, 2026 •

edited

Loading