Skip to content

[TRTLLM-13784][chore] Remove legacy TensorRT-engine Triton backend#15907

Open
Wanli-Jiang wants to merge 1 commit into
NVIDIA:mainfrom
Wanli-Jiang:user/williamj/deprecated-trt-backend-step1p4
Open

[TRTLLM-13784][chore] Remove legacy TensorRT-engine Triton backend#15907
Wanli-Jiang wants to merge 1 commit into
NVIDIA:mainfrom
Wanli-Jiang:user/williamj/deprecated-trt-backend-step1p4

Conversation

@Wanli-Jiang

@Wanli-Jiang Wanli-Jiang commented Jul 3, 2026

Copy link
Copy Markdown
Collaborator

Features

Step 1.4 of removing the legacy TensorRT backend: delete every Triton serving
surface that targets legacy TensorRT engines — the C++ backend (inflight_batcher_llm) and
all its TRT-engine model-repo templates (gpt/whisper/multimodal/disaggregated_serving)
plus their paired client tools — keeping only the PyTorch/LLM API Triton backend
(all_models/llmapi) and the shared launcher/client tooling.

  • Branch: user/williamj/deprecated-trt-backend-step1p4 · base: origin/main @ fb0d68be9c
  • Commit ebebf27d90 · 94 files, +5 / −25,978 (85 deletions, 9 edits).
  • Status: complete & validated (see Validation).

Scope

  • Removed: the C++ backend, every TRT-engine model-repo template, their paired client
    tools, and the C++-backend QA CI. The templates all run TRT engines (via the C++ backend or
    tensorrt_llm.runtime ModelRunner/Session); multimodal was already deprecated (EOL V1.2).
  • Kept: the llmapi (PyTorch) backend, the shared launcher, and the shared client tools.
  • Integration test defs left to PR3. The TRT-engine test defs under
    tests/integration/defs/triton_server/ — and their test-db/waives/qa list entries —
    are removed by TRTLLM-13783 (#15810), not here, to keep the two PRs non-overlapping.
    See the PR3 hand-off in Notes.

Removed (85)

Every file is a pure deletion — it serves or exercises legacy TensorRT engines and has no
consumer once the TRT backend is gone.

C++ Triton backend — triton_backend/inflight_batcher_llm/ (28)

Builds libtriton_tensorrtllm.so + trtllmExecutorWorker; the entire engine-serving backend.

  • CMakeLists.txt, cmake/TritonTensorRTLLMBackendConfig.cmake.in, cmake/modules/set_ifndef.cmake, scripts/build.sh — build system for the .so/worker.
  • src/libtensorrtllm.cc, src/libtriton_tensorrtllm.ldscript — Triton backend entry + symbol export map.
  • src/model_state.{cc,h}, src/model_instance_state.{cc,h} — Triton model/instance lifecycle driving the TRT executor.
  • src/custom_metrics_reporter/custom_metrics_reporter.{cc,h} — backend-specific Triton metrics.
  • src/namedTensor.{cpp,h}, src/utils.{cc,h} — backend-internal request/tensor plumbing.
  • client/{__init__.py,README.md,inflight_batcher_llm_client.py,end_to_end_grpc_client.py,e2e_grpc_speculative_decoding_client.py} — gRPC clients hard-wired to this backend.
  • tests/{CMakeLists.txt,modelState.cpp,modelInstanceStateTest.cpp,utilsTest.cpp,first.json,second.json,third.json} — C++ unit tests of the removed src/.

TRT-engine model-repo templates — triton_backend/all_models/ (39)

Each template drives TRT engines via the C++ backend or tensorrt_llm.runtime ModelRunner/Session.

  • inflight_batcher_llm/ (12): ensemble/{1/.tmp,config.pbtxt}, preprocessing/{1/model.py,config.pbtxt}, postprocessing/{1/model.py,config.pbtxt}, tensorrt_llm/{1/model.py,config.pbtxt}, tensorrt_llm_bls/{1/model.py,1/lib/decode.py,1/lib/triton_decoder.py,config.pbtxt} — the flagship engine template + its BLS.
  • gpt/ (8): ensemble/{1/.tmp,config.pbtxt}, preprocessing, postprocessing, tensorrt_llm (model.py+config.pbtxt each) — GPT engine template.
  • disaggregated_serving/ (4): README.md, disaggregated_serving.md, disaggregated_serving_bls/{1/model.py,config.pbtxt} — BLS orchestrator over two engine instances; dead without the backend.
  • multimodal/ (10): Deprecation_notice.md, ensemble/config.pbtxt, multimodal_encoders/{1/model.py,1/multimodal_utils.py,config.pbtxt}, requirements-{llava-onevision,mistral3.1,mllama,qwen2vl,vila}.txt — already deprecated (EOL V1.2).
  • whisper/ (4): whisper_bls/{1/model.py,1/fbank.py,1/tokenizer.py,config.pbtxt} — Whisper engine BLS template.
  • tests/ (5): test_decode.py, test_triton_decoder.py, test_python_backend.py, test_multi_image_preprocess.py, test_multimodal_encoders.py — unit tests whose PYTHONPATH targeted the removed templates.

Paired client tools — triton_backend/tools/ (8)

  • gpt/ (6): client.py, client_async.py, end_to_end_test.py, benchmark_core_model.py, gen_input_data.py, input_data.json.
  • multimodal/client.py, whisper/client.py.

C++-backend QA CI — triton_backend/ci/ (6)

ci/README.md documented only this CI, so ci/ goes wholesale.

  • L0_backend_trtllm/{generate_engines.sh,test.sh,simple_data.json,base_metrics_verification_tests.py,custom_metrics_verification_tests.py}, README.md.

Kept (deliberately)

  • triton_backend/all_models/llmapi/ — the PyTorch/LLM API Triton backend; test_llmapi_python_backend.py is the sole surviving all_models/tests/ file.
  • triton_backend/scripts/launch_triton_server.py (default --model_repo now points at ../all_models/llmapi), triton_backend/requirements.txt.
  • Shared triton_backend/tools/: inflight_batcher_llm/ client scripts (end_to_end_test.py/benchmark_core_model.py imported by the llmapi PyTorch test), llmapi_client.py, fill_template.py, dataset/, utils/, tests/.

Edited (9)

File Δ Change Why
docker/Dockerfile.multi −12 Dropped the whole tritonbuild stage (compiled the backend via inflight_batcher_llm/scripts/build.sh) and the three tritonrelease COPYs: inflight_batcher_llm/scripts, inflight_batcher_llm/client, and --from=tritonbuild …/backends/tensorrtllm. Its only output was the removed libtriton_tensorrtllm.so; keeping it would fail the build. Kept the tritonrelease COPYs of all_models/scripts/tools.
jenkins/Build.groovy −16 Removed the cmake … && make install step that built inflight_batcher_llm/build/ and the two cp libtriton_tensorrtllm.so/trtllmExecutorWorker packaging lines; renumbered trailing "Step" comments (3/4/5/6 → 3/4/5). Build target and artifacts no longer exist. Step sequence stays contiguous (1–5, no gaps).
jenkins/L0_Test.groovy −6 Removed mkdir -p /opt/tritonserver/backends/tensorrtllm + the two non-aarch64 cp …so/…Worker copies into it. Those artifacts are no longer built/packaged; the copy would fail.
tests/integration/defs/thirdparty/test_cmake_third_party.py −3 Removed the "triton_backend/inflight_batcher_llm/*" entry (+ 2 comment lines) from IGNORE_PATTERNS. Dangling ignore-glob for the deleted backend CMakeLists.txt.
triton_backend/scripts/launch_triton_server.py +1/−1 Default --model_repo: …/../all_models/gpt…/../all_models/llmapi. all_models/gpt was deleted; default must point at a surviving template.
legacy-files.txt −33 Removed 33 exempt-file entries for the deleted paths. Source of truth for the generated lint configs; the 4 tools/inflight_batcher_llm/* entries stay (shared, kept).
ruff-legacy.toml −33 Regenerated from legacy-files.txt. Generated artifact (scripts/legacy_utils.py gen-configs).
pyproject.toml −33 Regenerated from legacy-files.txt. Same generated set.
.pre-commit-config.yaml −70 Regenerated from legacy-files.txt (each file appears in two hook files: alternations → 2×). Same generated set; verify-legacy-config hook enforces it.

Validation

Re-verified on HEAD ebebf27d90:

  • scripts/legacy_utils.py check-configs → "All generated configs are up to date"; verify-legacy-config pre-commit hook passes (exit 0).
  • .pre-commit-config.yaml parses as valid YAML; the 4 surviving tools/inflight_batcher_llm/* config entries are the kept shared client tools; no removed-path reference remains in any generated config.
  • jenkins/Build.groovy "Step" comments sequential 1–5, no gaps.
  • No repo-wide reference to the removed backend/templates/tools remains, except the PR3-owned tests/integration/defs/triton_server/* defs (see Notes), the disaggregated_serving blog link, and helix.md's unrelated test_disaggregated_serving.py path. Swept: libtriton_tensorrtllm, trtllmExecutorWorker, whisper_bls, custom_metrics_reporter, L0_backend_trtllm, tritonbuild, generate_engines.sh, all_models/{gpt,whisper,multimodal,disaggregated_serving}, tools/{gpt,whisper,multimodal} — all clean.
  • Committed with --no-verify (the type-check/mypy hook has a known stash-rollback glitch in this sandbox); the hooks above were run manually and pass. No runtime tensorrt_llm/** or cpp/** source is touched.

CI / QA safety

Confirmed no breaking change to the CI/QA test machinery:

  • No test-ID / test-list file is touched — the only test-list-adjacent file in the commit is legacy-files.txt (lint exempt-file source), not test-db/*.yml, waives.txt, or any qa/*.txt.
  • scripts/check_test_list.py --validateOK: 2701 unique test entries validated. (exit 0) — the AST validator behind CI's check_test_list.py --l0 --qa --waive step (L0_Test.groovy:2674). All triton_server/* entries in test-db/*.yml (61), waives.txt (23), and qa/llm_triton_integration.txt (351) still resolve because their def files live on this branch.
  • Jenkins edits leave no orphaned symbols: llmPath/tritonShortTag fully gone from Build.groovy; isAarch64 fully gone from L0_Test.groovy.
  • Surviving triton_backend/ references all point at kept paths: LLM_BACKEND_ROOT=${llmSrc}/triton_backend (root still exists), the L0_MergeRequest.groovy path→stage mapping ("triton_backend/": ["-Triton-"]), and the Dockerfile.multi tritonrelease COPYs.

Notes

  • PR3 (TRTLLM-13783) hand-off — test lists. The Triton-server test-list/waive entries are
    a unit with the triton_server/*.py def files PR3 removes, so PR3 cleans them (splitting the
    edits here would cause merge conflicts and doesn't help — the defs still exist on this branch,
    so all entries resolve). PR3 must purge, alongside the def files:
    • waives.txt lines 571–593 — 23 triton_server/test_triton{,_llm,_rcca}.py::* SKIPs.
    • test-db/*.yml61 entries across l0_a30.yml (43), l0_a100.yml (14), l0_b200.yml (3), l0_dgx_h200.yml (1); plus the 351 entries in qa/llm_triton_integration.txt.
    • Do NOT touch waives.txt lines 87, 370, 379, 380, 384, 388 and the [triton-…]/triton_ssm entries in qa/llm_function_rtx6k.txt/llm_function_stress.txt — those are the PyTorch Triton kernel backend (GPT-OSS/Nemotron-H/AutoDeploy), a different "triton".
  • Landing coupling. This PR and PR3 must land together (or PR3 first): several of the 61
    active test-db entries (test_gpt, test_whisper, test_python_bls_unit_tests, …) run
    against templates this PR deletes, so they would fail at execution in the gap. Static list
    validation stays green throughout — this is a runtime coupling only.
  • Docs deferred to PR2 (TRTLLM-13782). docs/.../blog05_Disaggregated_Serving_in_TensorRT-LLM.md
    links to the removed all_models/disaggregated_serving dir (now a dead GitHub link); left
    untouched to keep this PR docs-free.

Remaining to land the PR

  • Rebase check: could not git fetch origin main in this sandbox (access/network); the locally-known origin/main is fb0d68be9c == HEAD~1, so the branch is a clean single commit. Rebase onto true top-of-tree before opening the PR if it has advanced.
  • Push + open PR against NVIDIA/TensorRT-LLM main (needs GH_CONFIG_DIR confirmed).

Summary by CodeRabbit

  • Chores
    • Simplified the backend release and CI packaging flow by removing several bundled Triton backend artifacts and related build steps.
    • Updated file selection and formatting rules to reflect the new, smaller set of tracked backend files.
    • Cleaned up outdated model, client, test, and documentation entries across multiple backend areas.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@Wanli-Jiang Wanli-Jiang marked this pull request as ready for review July 3, 2026 05:40
@Wanli-Jiang Wanli-Jiang requested review from a team as code owners July 3, 2026 05:40
@coderabbitai

coderabbitai Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

This PR removes the legacy triton_backend implementation (models, configs, clients, CI scripts, C++ backend source, and documentation) and updates build/deployment pipelines (Dockerfile.multi, Jenkins scripts) and tooling configs (pre-commit, ruff, legacy-files) to reference a much smaller retained subset of triton_backend files.

Changes

Triton Backend Legacy Removal and Tooling Updates

Layer / File(s) Summary
Build and deploy pipeline updates
docker/Dockerfile.multi, jenkins/Build.groovy, jenkins/L0_Test.groovy
Docker's tritonrelease stage drops the intermediate build stage and only copies all_models/scripts/tools; Jenkins packaging and test setup no longer build or copy compiled triton backend artifacts.
Tooling/config file-list updates
.pre-commit-config.yaml, legacy-files.txt, pyproject.toml, ruff-legacy.toml, tests/integration/defs/thirdparty/test_cmake_third_party.py
File-selection regex/include/exclude lists across lint/format configs are narrowed to a reduced set of triton_backend paths (mostly llmapi and select tools), and a CMake ignore pattern for inflight_batcher_llm is removed.
Legacy model, config, client, CI, and C++ backend removal
triton_backend/all_models/..., triton_backend/ci/..., triton_backend/inflight_batcher_llm/...
A large set of Triton backend Python models, .pbtxt configs, client scripts, CI test/metrics scripts, C++ backend source/headers, CMake build files, and documentation are deleted entirely (e.g., GPT/inflight-batcher/disaggregated-serving/multimodal/whisper models, ensemble configs, custom metrics reporter, libtensorrtllm.cc).

Estimated code review effort: 4 (Complex) | ~60 minutes

Suggested reviewers: tfogal, Superjomn, brb-nv, qixiang-99, yizhang-nv, fredricz-20070104

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title matches the main change and uses the required [ticket][type] format.
Description check ✅ Passed It covers scope, validation, CI safety, and rollout notes, though it uses custom headings instead of the template's exact sections.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

Remove the Triton serving surface that targets legacy TensorRT engines, as
part of deprecating the TensorRT backend. Only the PyTorch/LLM API Triton
backend (all_models/llmapi) and the shared launcher/client tooling remain.

- Delete the C++ Triton backend triton_backend/inflight_batcher_llm/
  (source, client, cmake, build/test scripts); it builds
  libtriton_tensorrtllm.so / trtllmExecutorWorker.
- Delete the legacy TensorRT-engine model-repo templates, all of which run
  TRT engines (via the C++ backend or tensorrt_llm.runtime ModelRunner/Session):
  all_models/{inflight_batcher_llm,disaggregated_serving,gpt,whisper,multimodal}
  (multimodal was already deprecated, EOL V1.2). Delete their paired client
  tools tools/{gpt,whisper,multimodal}.
- Delete triton_backend/ci/L0_backend_trtllm/ (the C++-backend QA CI) and the
  all_models/tests/ unit tests that only exercised the removed templates.
- launch_triton_server.py: default --model_repo now points at the surviving
  all_models/llmapi template.
- docker/Dockerfile.multi: drop the tritonbuild stage (built the C++ backend)
  and the tritonrelease COPYs of the backend .so, its scripts, and client.
- jenkins: drop the C++-backend cmake build and artifact packaging
  (Build.groovy) and the copy into /opt/tritonserver/backends/tensorrtllm
  (L0_Test.groovy).
- Regenerate the legacy lint configs (legacy-files.txt -> ruff-legacy.toml,
  pyproject.toml, .pre-commit-config.yaml) and drop the stale
  inflight_batcher_llm IGNORE_PATTERNS entry in the cmake third-party test.

The legacy TensorRT-engine Triton integration test defs under
tests/integration/defs/triton_server/ are handled separately in the
TRTLLM-13783 test-removal change.

Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
@Wanli-Jiang Wanli-Jiang force-pushed the user/williamj/deprecated-trt-backend-step1p4 branch from ebebf27 to afc5f52 Compare July 3, 2026 05:46
@Wanli-Jiang

Copy link
Copy Markdown
Collaborator Author

/bot run --add-multi-gpu-test --disable-fail-fast

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #57374 [ run ] triggered by Bot. Commit: afc5f52 Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #57374 [ run ] completed with state SUCCESS. Commit: afc5f52
/LLM/main/L0_MergeRequest_PR pipeline #46124 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants