[TRTLLM-13784][chore] Remove legacy TensorRT-engine Triton backend#15907
[TRTLLM-13784][chore] Remove legacy TensorRT-engine Triton backend#15907Wanli-Jiang wants to merge 1 commit into
Conversation
📝 WalkthroughWalkthroughThis PR removes the legacy ChangesTriton Backend Legacy Removal and Tooling Updates
Estimated code review effort: 4 (Complex) | ~60 minutes Suggested reviewers: 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Remove the Triton serving surface that targets legacy TensorRT engines, as
part of deprecating the TensorRT backend. Only the PyTorch/LLM API Triton
backend (all_models/llmapi) and the shared launcher/client tooling remain.
- Delete the C++ Triton backend triton_backend/inflight_batcher_llm/
(source, client, cmake, build/test scripts); it builds
libtriton_tensorrtllm.so / trtllmExecutorWorker.
- Delete the legacy TensorRT-engine model-repo templates, all of which run
TRT engines (via the C++ backend or tensorrt_llm.runtime ModelRunner/Session):
all_models/{inflight_batcher_llm,disaggregated_serving,gpt,whisper,multimodal}
(multimodal was already deprecated, EOL V1.2). Delete their paired client
tools tools/{gpt,whisper,multimodal}.
- Delete triton_backend/ci/L0_backend_trtllm/ (the C++-backend QA CI) and the
all_models/tests/ unit tests that only exercised the removed templates.
- launch_triton_server.py: default --model_repo now points at the surviving
all_models/llmapi template.
- docker/Dockerfile.multi: drop the tritonbuild stage (built the C++ backend)
and the tritonrelease COPYs of the backend .so, its scripts, and client.
- jenkins: drop the C++-backend cmake build and artifact packaging
(Build.groovy) and the copy into /opt/tritonserver/backends/tensorrtllm
(L0_Test.groovy).
- Regenerate the legacy lint configs (legacy-files.txt -> ruff-legacy.toml,
pyproject.toml, .pre-commit-config.yaml) and drop the stale
inflight_batcher_llm IGNORE_PATTERNS entry in the cmake third-party test.
The legacy TensorRT-engine Triton integration test defs under
tests/integration/defs/triton_server/ are handled separately in the
TRTLLM-13783 test-removal change.
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
ebebf27 to
afc5f52
Compare
|
/bot run --add-multi-gpu-test --disable-fail-fast |
|
PR_Github #57374 [ run ] triggered by Bot. Commit: |
|
PR_Github #57374 [ run ] completed with state
|
Features
Step 1.4 of removing the legacy TensorRT backend: delete every Triton serving
surface that targets legacy TensorRT engines — the C++ backend (
inflight_batcher_llm) andall its TRT-engine model-repo templates (
gpt/whisper/multimodal/disaggregated_serving)plus their paired client tools — keeping only the PyTorch/LLM API Triton backend
(
all_models/llmapi) and the shared launcher/client tooling.user/williamj/deprecated-trt-backend-step1p4· base:origin/main@fb0d68be9cebebf27d90·94 files, +5 / −25,978(85 deletions, 9 edits).Scope
tools, and the C++-backend QA CI. The templates all run TRT engines (via the C++ backend or
tensorrt_llm.runtimeModelRunner/Session);multimodalwas already deprecated (EOL V1.2).llmapi(PyTorch) backend, the shared launcher, and the shared client tools.tests/integration/defs/triton_server/— and theirtest-db/waives/qalist entries —are removed by TRTLLM-13783 (
#15810), not here, to keep the two PRs non-overlapping.See the PR3 hand-off in Notes.
Removed (85)
Every file is a pure deletion — it serves or exercises legacy TensorRT engines and has no
consumer once the TRT backend is gone.
C++ Triton backend —
triton_backend/inflight_batcher_llm/(28)Builds
libtriton_tensorrtllm.so+trtllmExecutorWorker; the entire engine-serving backend.CMakeLists.txt,cmake/TritonTensorRTLLMBackendConfig.cmake.in,cmake/modules/set_ifndef.cmake,scripts/build.sh— build system for the.so/worker.src/libtensorrtllm.cc,src/libtriton_tensorrtllm.ldscript— Triton backend entry + symbol export map.src/model_state.{cc,h},src/model_instance_state.{cc,h}— Triton model/instance lifecycle driving the TRT executor.src/custom_metrics_reporter/custom_metrics_reporter.{cc,h}— backend-specific Triton metrics.src/namedTensor.{cpp,h},src/utils.{cc,h}— backend-internal request/tensor plumbing.client/{__init__.py,README.md,inflight_batcher_llm_client.py,end_to_end_grpc_client.py,e2e_grpc_speculative_decoding_client.py}— gRPC clients hard-wired to this backend.tests/{CMakeLists.txt,modelState.cpp,modelInstanceStateTest.cpp,utilsTest.cpp,first.json,second.json,third.json}— C++ unit tests of the removedsrc/.TRT-engine model-repo templates —
triton_backend/all_models/(39)Each template drives TRT engines via the C++ backend or
tensorrt_llm.runtimeModelRunner/Session.inflight_batcher_llm/(12):ensemble/{1/.tmp,config.pbtxt},preprocessing/{1/model.py,config.pbtxt},postprocessing/{1/model.py,config.pbtxt},tensorrt_llm/{1/model.py,config.pbtxt},tensorrt_llm_bls/{1/model.py,1/lib/decode.py,1/lib/triton_decoder.py,config.pbtxt}— the flagship engine template + its BLS.gpt/(8):ensemble/{1/.tmp,config.pbtxt},preprocessing,postprocessing,tensorrt_llm(model.py+config.pbtxt each) — GPT engine template.disaggregated_serving/(4):README.md,disaggregated_serving.md,disaggregated_serving_bls/{1/model.py,config.pbtxt}— BLS orchestrator over two engine instances; dead without the backend.multimodal/(10):Deprecation_notice.md,ensemble/config.pbtxt,multimodal_encoders/{1/model.py,1/multimodal_utils.py,config.pbtxt},requirements-{llava-onevision,mistral3.1,mllama,qwen2vl,vila}.txt— already deprecated (EOL V1.2).whisper/(4):whisper_bls/{1/model.py,1/fbank.py,1/tokenizer.py,config.pbtxt}— Whisper engine BLS template.tests/(5):test_decode.py,test_triton_decoder.py,test_python_backend.py,test_multi_image_preprocess.py,test_multimodal_encoders.py— unit tests whosePYTHONPATHtargeted the removed templates.Paired client tools —
triton_backend/tools/(8)gpt/(6):client.py,client_async.py,end_to_end_test.py,benchmark_core_model.py,gen_input_data.py,input_data.json.multimodal/client.py,whisper/client.py.C++-backend QA CI —
triton_backend/ci/(6)ci/README.mddocumented only this CI, soci/goes wholesale.L0_backend_trtllm/{generate_engines.sh,test.sh,simple_data.json,base_metrics_verification_tests.py,custom_metrics_verification_tests.py},README.md.Kept (deliberately)
triton_backend/all_models/llmapi/— the PyTorch/LLM API Triton backend;test_llmapi_python_backend.pyis the sole survivingall_models/tests/file.triton_backend/scripts/launch_triton_server.py(default--model_reponow points at../all_models/llmapi),triton_backend/requirements.txt.triton_backend/tools/:inflight_batcher_llm/client scripts (end_to_end_test.py/benchmark_core_model.pyimported by thellmapiPyTorch test),llmapi_client.py,fill_template.py,dataset/,utils/,tests/.Edited (9)
docker/Dockerfile.multitritonbuildstage (compiled the backend viainflight_batcher_llm/scripts/build.sh) and the threetritonreleaseCOPYs:inflight_batcher_llm/scripts,inflight_batcher_llm/client, and--from=tritonbuild …/backends/tensorrtllm.libtriton_tensorrtllm.so; keeping it would fail the build. Kept thetritonreleaseCOPYs ofall_models/scripts/tools.jenkins/Build.groovycmake … && make installstep that builtinflight_batcher_llm/build/and the twocp libtriton_tensorrtllm.so/trtllmExecutorWorkerpackaging lines; renumbered trailing "Step" comments (3/4/5/6 → 3/4/5).jenkins/L0_Test.groovymkdir -p /opt/tritonserver/backends/tensorrtllm+ the two non-aarch64cp …so/…Workercopies into it.tests/integration/defs/thirdparty/test_cmake_third_party.py"triton_backend/inflight_batcher_llm/*"entry (+ 2 comment lines) fromIGNORE_PATTERNS.CMakeLists.txt.triton_backend/scripts/launch_triton_server.py--model_repo:…/../all_models/gpt→…/../all_models/llmapi.all_models/gptwas deleted; default must point at a surviving template.legacy-files.txttools/inflight_batcher_llm/*entries stay (shared, kept).ruff-legacy.tomllegacy-files.txt.scripts/legacy_utils.py gen-configs).pyproject.tomllegacy-files.txt..pre-commit-config.yamllegacy-files.txt(each file appears in two hookfiles:alternations → 2×).verify-legacy-confighook enforces it.Validation
Re-verified on HEAD
ebebf27d90:scripts/legacy_utils.py check-configs→ "All generated configs are up to date";verify-legacy-configpre-commit hook passes (exit 0)..pre-commit-config.yamlparses as valid YAML; the 4 survivingtools/inflight_batcher_llm/*config entries are the kept shared client tools; no removed-path reference remains in any generated config.jenkins/Build.groovy"Step" comments sequential 1–5, no gaps.tests/integration/defs/triton_server/*defs (see Notes), thedisaggregated_servingblog link, andhelix.md's unrelatedtest_disaggregated_serving.pypath. Swept:libtriton_tensorrtllm,trtllmExecutorWorker,whisper_bls,custom_metrics_reporter,L0_backend_trtllm,tritonbuild,generate_engines.sh,all_models/{gpt,whisper,multimodal,disaggregated_serving},tools/{gpt,whisper,multimodal}— all clean.--no-verify(thetype-check/mypy hook has a known stash-rollback glitch in this sandbox); the hooks above were run manually and pass. No runtimetensorrt_llm/**orcpp/**source is touched.CI / QA safety
Confirmed no breaking change to the CI/QA test machinery:
legacy-files.txt(lint exempt-file source), nottest-db/*.yml,waives.txt, or anyqa/*.txt.scripts/check_test_list.py --validate→OK: 2701 unique test entries validated.(exit 0) — the AST validator behind CI'scheck_test_list.py --l0 --qa --waivestep (L0_Test.groovy:2674). Alltriton_server/*entries intest-db/*.yml(61),waives.txt(23), andqa/llm_triton_integration.txt(351) still resolve because their def files live on this branch.llmPath/tritonShortTagfully gone fromBuild.groovy;isAarch64fully gone fromL0_Test.groovy.triton_backend/references all point at kept paths:LLM_BACKEND_ROOT=${llmSrc}/triton_backend(root still exists), theL0_MergeRequest.groovypath→stage mapping ("triton_backend/": ["-Triton-"]), and theDockerfile.multitritonreleaseCOPYs.Notes
a unit with the
triton_server/*.pydef files PR3 removes, so PR3 cleans them (splitting theedits here would cause merge conflicts and doesn't help — the defs still exist on this branch,
so all entries resolve). PR3 must purge, alongside the def files:
waives.txtlines 571–593 — 23triton_server/test_triton{,_llm,_rcca}.py::*SKIPs.test-db/*.yml— 61 entries acrossl0_a30.yml(43),l0_a100.yml(14),l0_b200.yml(3),l0_dgx_h200.yml(1); plus the 351 entries inqa/llm_triton_integration.txt.waives.txtlines 87, 370, 379, 380, 384, 388 and the[triton-…]/triton_ssmentries inqa/llm_function_rtx6k.txt/llm_function_stress.txt— those are the PyTorch Triton kernel backend (GPT-OSS/Nemotron-H/AutoDeploy), a different "triton".active
test-dbentries (test_gpt,test_whisper,test_python_bls_unit_tests, …) runagainst templates this PR deletes, so they would fail at execution in the gap. Static list
validation stays green throughout — this is a runtime coupling only.
docs/.../blog05_Disaggregated_Serving_in_TensorRT-LLM.mdlinks to the removed
all_models/disaggregated_servingdir (now a dead GitHub link); leftuntouched to keep this PR docs-free.
Remaining to land the PR
git fetch origin mainin this sandbox (access/network); the locally-knownorigin/mainisfb0d68be9c==HEAD~1, so the branch is a clean single commit. Rebase onto true top-of-tree before opening the PR if it has advanced.NVIDIA/TensorRT-LLMmain(needsGH_CONFIG_DIRconfirmed).Summary by CodeRabbit
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either
api-compatibleorapi-breaking. Forapi-breaking, includeBREAKINGin the PR title.Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.