Remove tensorrt backend#20
Open
QiJune wants to merge 3 commits into
Open
Conversation
added 3 commits
June 22, 2026 10:54
Remove the TensorRT execution backend from the Python package so that tensorrt_llm imports and runs without the `tensorrt` package installed. PyTorch becomes the sole supported backend; autodeploy is preserved. The C++ TensorRT-library (nvinfer1) decoupling lands in a follow-up. Shared-symbol extraction (TensorRT-free homes, extract-before-delete): - functional enums / AllReduce types / RopeEmbeddingUtils -> functional_enums.py - QuantConfig/LayerQuantConfig -> quantization/quant_config.py; QuantAlgo redirect - runtime ModelConfig -> runtime/model_config.py; CUASSERT -> _utils - quant weight-preprocessing -> quantization/preprocessing.py - PretrainedConfig/SpeculativeDecodingMode trimmed into a TensorRT-free models/modeling_utils.py; QuantModeWrapper kept TensorRT-free in _utils - TensorRT-free package init (_init_runtime) replaces _common._init - extract TensorRT-free CustomAllReduceHelper; relocate _is_building Backend removal: - delete builder/network/functional/graph_rewriting/_common/module/parameter/ python_plugin/plugin/layers, TRT model impls, TRT runtime classes (Session/GenerationSession/ModelRunner*/EncDec/Multimodal), _tensorrt_engine, TRT quant (layers/quantize/functional), TRT tools, bench/build, trtllm-build/refit/prune commands - reject backend="tensorrt"/"trt" with ValueError across LLM API and CLIs; drop trtllm-build/refit/prune console scripts and 'tensorrt' telemetry - remove module-level _tensorrt_engine imports + dispatch in serve/eval/openai_server/bench/evaluate - decouple executor workers from builder.Engine; decouple logger/profiler from tensorrt Sweep: - remove TensorRT Triton engine backend paths + L0_backend_trtllm CI - delete TensorRT docs + example dirs; add migration guide - remove TensorRT unit tests and their CI test-list entries Signed-off-by: junq (generated by with_the_same_user script) <junq@ipp2-0160.ipp2a1.colossus.nvidia.com>
Sever the C++ tree's compile- and link-time dependency on TensorRT so the shared core (runtime, batch manager, executor API, KV cache, sampling, kernels, nanobind bridge) builds, links, and runs without the TensorRT library. Follows the Python TensorRT-backend removal; PyTorch is the sole backend. Internal types (serialization-compatible): - add tensorrt_llm::DataType/Dims/ILogger (common/tllmDataType.h); DataType enumerator values mirror nvinfer1::DataType for byte-compatible serialization - migrate nvinfer1::DataType/Dims/ILogger -> tensorrt_llm:: across ~155 files; replace NvInfer.h includes with the internal header Remove the TensorRT-engine execution path (unused by the PyTorch backend): - plugins/ (nvinfer_plugin_tensorrt_llm), engine runtime wrappers (tllmRuntime, tllmStreamReaders, layerProfiler, rawEngine, tllmLogger), TRT model adapters (trtGptModel*/trtEncoderModel/trtGptModelFactory), executor.cpp/executorImpl, the C++ Executor + TllmRuntime nanobind bindings, executor_worker, disaggServerUtil, the engine I/O buffers and engine-only logits/decoder algos - relocate the retained KVCacheEvent ctor and executor::version() (which had lived in executorImpl/executor.cpp) into executor/kvCacheEvent.cpp - decouple shared batch_manager files (inflightBatchingUtils, medusaBuffers, lookaheadBuffers) from the removed engine runtime, keeping the PyTorch surface Build/packaging (no TensorRT): - drop find_package(TensorRT)/TRT_LIB/NvInfer include injection from cpp CMake, the OnnxParser test link, the plugins/executor_worker subdirs and wheel target - setup.py no longer packages libnvinfer_plugin_tensorrt_llm.so / executorWorker - build_wheel.py: TensorRT_ROOT optional, drop the tensorrt install check - requirements.txt: drop tensorrt Verified: libtensorrt_llm.so + th_common + bindings link with no libnvinfer NEEDED entry; import tensorrt_llm + bindings + LLM succeed with tensorrt blocked; bindings.DataType members/values unchanged (AC-5.1 parity test added). Signed-off-by: junq (generated by with_the_same_user script) <junq@ipp2-0160.ipp2a1.colossus.nvidia.com>
…tion Make the unit test suite collectable after the TensorRT backend removal and clean up remaining stale references found by an audit. - tests/unittest/utils/util.py: drop deleted TRT imports (torch_dtype_to_trt, trt_dtype_to_torch, ContextFMHAType, Session/TensorInfo) and the TRT-engine test helpers; keep the PyTorch/GPU helpers. - redirect test imports: functional enums -> functional_enums, quantization weight-preproc -> quantization.preprocessing; stub/lazy the removed TRT LLM symbols in mixed test files; delete tests that are entirely TensorRT-backend. - tensorrt_llm/_torch/auto_deploy/llm_args.py: remove the vestigial build_config field/validator/import (BuildConfig was deleted) so the preserved AutoDeploy backend imports again. - serialization.py: drop allowlist entries for deleted builder.BuildConfig and plugin.plugin.PluginConfig. - runtime.pyi: drop the removed TllmRuntime stub. - usage/llm_args_golden_manifest.json: regenerate (no TrtLlmArgs / 'tensorrt'). - cpp iTensor.h: remove the unused nvinfer1 forward declaration. Verified: tensorrt_llm + bindings + LLM + AutoDeploy import with tensorrt blocked; bindings.DataType parity holds; unit tests collect (the only remaining collection errors are an unrelated missing optional 'ray' dependency). Signed-off-by: junq (generated by with_the_same_user script) <junq@ipp2-0160.ipp2a1.colossus.nvidia.com>
1 task
Wanli-Jiang
added a commit
to Wanli-Jiang/TensorRT-LLM
that referenced
this pull request
Jun 30, 2026
Remove legacy TensorRT-backend example directories and documentation pages, add docs/source/legacy/tensorrt-backend-removal.md migration guide, and drop TensorRT references from the surviving docs (index, release notes, torch arch/kv-cache overviews, and two blogs). Pure docs/examples change: no Python or C++ source is touched, so the package build and import are unaffected. First of the incremental PRs that split Freddy's TensorRT-removal draft (QiJune#20). Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Wanli-Jiang
added a commit
to Wanli-Jiang/TensorRT-LLM
that referenced
this pull request
Jun 30, 2026
Delete the TensorRT-only test suites (tests/unittest/trt/**, the TRT builder/ runtime/plugin/quantization/medusa/lora-vs-trt tests under others/, llmapi, and tools/plugin_gen) and the TensorRT-engine Triton backend paths under triton_backend/, plus the deleted triton_server integration tests. Re-derive the test-list cleanup against current main (Freddy's draft predates main by ~165 commits, so his hunks no longer apply): drop the now-dangling entries from tests/integration/test_lists/test-db/*.yml and the stale TensorRT waives, and replace qa/llm_triton_integration.txt with the llmapi-backend triton tests. Entries were removed only where the referenced test path no longer exists on disk, so no surviving PyTorch test is affected. No Python or C++ source is touched. Second of the incremental PRs splitting Freddy's TensorRT-removal draft (QiJune#20). Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
@coderabbitai summary
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either
api-compatibleorapi-breaking. Forapi-breaking, includeBREAKINGin the PR title.Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.