Remove tensorrt backend by QiJune · Pull Request #20 · QiJune/TensorRT-LLM

QiJune · 2026-06-22T15:06:40Z

@coderabbitai summary

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Remove the TensorRT execution backend from the Python package so that tensorrt_llm imports and runs without the `tensorrt` package installed. PyTorch becomes the sole supported backend; autodeploy is preserved. The C++ TensorRT-library (nvinfer1) decoupling lands in a follow-up. Shared-symbol extraction (TensorRT-free homes, extract-before-delete): - functional enums / AllReduce types / RopeEmbeddingUtils -> functional_enums.py - QuantConfig/LayerQuantConfig -> quantization/quant_config.py; QuantAlgo redirect - runtime ModelConfig -> runtime/model_config.py; CUASSERT -> _utils - quant weight-preprocessing -> quantization/preprocessing.py - PretrainedConfig/SpeculativeDecodingMode trimmed into a TensorRT-free models/modeling_utils.py; QuantModeWrapper kept TensorRT-free in _utils - TensorRT-free package init (_init_runtime) replaces _common._init - extract TensorRT-free CustomAllReduceHelper; relocate _is_building Backend removal: - delete builder/network/functional/graph_rewriting/_common/module/parameter/ python_plugin/plugin/layers, TRT model impls, TRT runtime classes (Session/GenerationSession/ModelRunner*/EncDec/Multimodal), _tensorrt_engine, TRT quant (layers/quantize/functional), TRT tools, bench/build, trtllm-build/refit/prune commands - reject backend="tensorrt"/"trt" with ValueError across LLM API and CLIs; drop trtllm-build/refit/prune console scripts and 'tensorrt' telemetry - remove module-level _tensorrt_engine imports + dispatch in serve/eval/openai_server/bench/evaluate - decouple executor workers from builder.Engine; decouple logger/profiler from tensorrt Sweep: - remove TensorRT Triton engine backend paths + L0_backend_trtllm CI - delete TensorRT docs + example dirs; add migration guide - remove TensorRT unit tests and their CI test-list entries Signed-off-by: junq (generated by with_the_same_user script) <junq@ipp2-0160.ipp2a1.colossus.nvidia.com>

Sever the C++ tree's compile- and link-time dependency on TensorRT so the shared core (runtime, batch manager, executor API, KV cache, sampling, kernels, nanobind bridge) builds, links, and runs without the TensorRT library. Follows the Python TensorRT-backend removal; PyTorch is the sole backend. Internal types (serialization-compatible): - add tensorrt_llm::DataType/Dims/ILogger (common/tllmDataType.h); DataType enumerator values mirror nvinfer1::DataType for byte-compatible serialization - migrate nvinfer1::DataType/Dims/ILogger -> tensorrt_llm:: across ~155 files; replace NvInfer.h includes with the internal header Remove the TensorRT-engine execution path (unused by the PyTorch backend): - plugins/ (nvinfer_plugin_tensorrt_llm), engine runtime wrappers (tllmRuntime, tllmStreamReaders, layerProfiler, rawEngine, tllmLogger), TRT model adapters (trtGptModel*/trtEncoderModel/trtGptModelFactory), executor.cpp/executorImpl, the C++ Executor + TllmRuntime nanobind bindings, executor_worker, disaggServerUtil, the engine I/O buffers and engine-only logits/decoder algos - relocate the retained KVCacheEvent ctor and executor::version() (which had lived in executorImpl/executor.cpp) into executor/kvCacheEvent.cpp - decouple shared batch_manager files (inflightBatchingUtils, medusaBuffers, lookaheadBuffers) from the removed engine runtime, keeping the PyTorch surface Build/packaging (no TensorRT): - drop find_package(TensorRT)/TRT_LIB/NvInfer include injection from cpp CMake, the OnnxParser test link, the plugins/executor_worker subdirs and wheel target - setup.py no longer packages libnvinfer_plugin_tensorrt_llm.so / executorWorker - build_wheel.py: TensorRT_ROOT optional, drop the tensorrt install check - requirements.txt: drop tensorrt Verified: libtensorrt_llm.so + th_common + bindings link with no libnvinfer NEEDED entry; import tensorrt_llm + bindings + LLM succeed with tensorrt blocked; bindings.DataType members/values unchanged (AC-5.1 parity test added). Signed-off-by: junq (generated by with_the_same_user script) <junq@ipp2-0160.ipp2a1.colossus.nvidia.com>

…tion Make the unit test suite collectable after the TensorRT backend removal and clean up remaining stale references found by an audit. - tests/unittest/utils/util.py: drop deleted TRT imports (torch_dtype_to_trt, trt_dtype_to_torch, ContextFMHAType, Session/TensorInfo) and the TRT-engine test helpers; keep the PyTorch/GPU helpers. - redirect test imports: functional enums -> functional_enums, quantization weight-preproc -> quantization.preprocessing; stub/lazy the removed TRT LLM symbols in mixed test files; delete tests that are entirely TensorRT-backend. - tensorrt_llm/_torch/auto_deploy/llm_args.py: remove the vestigial build_config field/validator/import (BuildConfig was deleted) so the preserved AutoDeploy backend imports again. - serialization.py: drop allowlist entries for deleted builder.BuildConfig and plugin.plugin.PluginConfig. - runtime.pyi: drop the removed TllmRuntime stub. - usage/llm_args_golden_manifest.json: regenerate (no TrtLlmArgs / 'tensorrt'). - cpp iTensor.h: remove the unused nvinfer1 forward declaration. Verified: tensorrt_llm + bindings + LLM + AutoDeploy import with tensorrt blocked; bindings.DataType parity holds; unit tests collect (the only remaining collection errors are an unrelated missing optional 'ray' dependency). Signed-off-by: junq (generated by with_the_same_user script) <junq@ipp2-0160.ipp2a1.colossus.nvidia.com>

Remove legacy TensorRT-backend example directories and documentation pages, add docs/source/legacy/tensorrt-backend-removal.md migration guide, and drop TensorRT references from the surviving docs (index, release notes, torch arch/kv-cache overviews, and two blogs). Pure docs/examples change: no Python or C++ source is touched, so the package build and import are unaffected. First of the incremental PRs that split Freddy's TensorRT-removal draft (QiJune#20). Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>

Delete the TensorRT-only test suites (tests/unittest/trt/**, the TRT builder/ runtime/plugin/quantization/medusa/lora-vs-trt tests under others/, llmapi, and tools/plugin_gen) and the TensorRT-engine Triton backend paths under triton_backend/, plus the deleted triton_server integration tests. Re-derive the test-list cleanup against current main (Freddy's draft predates main by ~165 commits, so his hunks no longer apply): drop the now-dangling entries from tests/integration/test_lists/test-db/*.yml and the stale TensorRT waives, and replace qa/llm_triton_integration.txt with the llmapi-backend triton tests. Entries were removed only where the referenced test path no longer exists on disk, so no surviving PyTorch test is affected. No Python or C++ source is touched. Second of the incremental PRs splitting Freddy's TensorRT-removal draft (QiJune#20). Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>

junq (generated by with_the_same_user script) added 3 commits June 22, 2026 10:54

github-actions Bot assigned QiJune Jun 22, 2026

Wanli-Jiang mentioned this pull request Jun 30, 2026

[None][chore] Remove legacy TensorRT examples, docs and tests NVIDIA/TensorRT-LLM#15760

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove tensorrt backend#20

Remove tensorrt backend#20
QiJune wants to merge 3 commits into
mainfrom
trtllm-remove-tensorrt-backend

QiJune commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

QiJune commented Jun 22, 2026

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant