Skip to content

Remove tensorrt backend#20

Open
QiJune wants to merge 3 commits into
mainfrom
trtllm-remove-tensorrt-backend
Open

Remove tensorrt backend#20
QiJune wants to merge 3 commits into
mainfrom
trtllm-remove-tensorrt-backend

Conversation

@QiJune

@QiJune QiJune commented Jun 22, 2026

Copy link
Copy Markdown
Owner

@coderabbitai summary

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

junq (generated by with_the_same_user script) added 3 commits June 22, 2026 10:54
Remove the TensorRT execution backend from the Python package so that
tensorrt_llm imports and runs without the `tensorrt` package installed.
PyTorch becomes the sole supported backend; autodeploy is preserved.
The C++ TensorRT-library (nvinfer1) decoupling lands in a follow-up.

Shared-symbol extraction (TensorRT-free homes, extract-before-delete):
- functional enums / AllReduce types / RopeEmbeddingUtils -> functional_enums.py
- QuantConfig/LayerQuantConfig -> quantization/quant_config.py; QuantAlgo redirect
- runtime ModelConfig -> runtime/model_config.py; CUASSERT -> _utils
- quant weight-preprocessing -> quantization/preprocessing.py
- PretrainedConfig/SpeculativeDecodingMode trimmed into a TensorRT-free
  models/modeling_utils.py; QuantModeWrapper kept TensorRT-free in _utils
- TensorRT-free package init (_init_runtime) replaces _common._init
- extract TensorRT-free CustomAllReduceHelper; relocate _is_building

Backend removal:
- delete builder/network/functional/graph_rewriting/_common/module/parameter/
  python_plugin/plugin/layers, TRT model impls, TRT runtime classes
  (Session/GenerationSession/ModelRunner*/EncDec/Multimodal), _tensorrt_engine,
  TRT quant (layers/quantize/functional), TRT tools, bench/build,
  trtllm-build/refit/prune commands
- reject backend="tensorrt"/"trt" with ValueError across LLM API and CLIs;
  drop trtllm-build/refit/prune console scripts and 'tensorrt' telemetry
- remove module-level _tensorrt_engine imports + dispatch in
  serve/eval/openai_server/bench/evaluate
- decouple executor workers from builder.Engine; decouple logger/profiler
  from tensorrt

Sweep:
- remove TensorRT Triton engine backend paths + L0_backend_trtllm CI
- delete TensorRT docs + example dirs; add migration guide
- remove TensorRT unit tests and their CI test-list entries

Signed-off-by: junq (generated by with_the_same_user script) <junq@ipp2-0160.ipp2a1.colossus.nvidia.com>
Sever the C++ tree's compile- and link-time dependency on TensorRT so the
shared core (runtime, batch manager, executor API, KV cache, sampling, kernels,
nanobind bridge) builds, links, and runs without the TensorRT library. Follows
the Python TensorRT-backend removal; PyTorch is the sole backend.

Internal types (serialization-compatible):
- add tensorrt_llm::DataType/Dims/ILogger (common/tllmDataType.h); DataType
  enumerator values mirror nvinfer1::DataType for byte-compatible serialization
- migrate nvinfer1::DataType/Dims/ILogger -> tensorrt_llm:: across ~155 files;
  replace NvInfer.h includes with the internal header

Remove the TensorRT-engine execution path (unused by the PyTorch backend):
- plugins/ (nvinfer_plugin_tensorrt_llm), engine runtime wrappers (tllmRuntime,
  tllmStreamReaders, layerProfiler, rawEngine, tllmLogger), TRT model adapters
  (trtGptModel*/trtEncoderModel/trtGptModelFactory), executor.cpp/executorImpl,
  the C++ Executor + TllmRuntime nanobind bindings, executor_worker,
  disaggServerUtil, the engine I/O buffers and engine-only logits/decoder algos
- relocate the retained KVCacheEvent ctor and executor::version() (which had
  lived in executorImpl/executor.cpp) into executor/kvCacheEvent.cpp
- decouple shared batch_manager files (inflightBatchingUtils, medusaBuffers,
  lookaheadBuffers) from the removed engine runtime, keeping the PyTorch surface

Build/packaging (no TensorRT):
- drop find_package(TensorRT)/TRT_LIB/NvInfer include injection from cpp CMake,
  the OnnxParser test link, the plugins/executor_worker subdirs and wheel target
- setup.py no longer packages libnvinfer_plugin_tensorrt_llm.so / executorWorker
- build_wheel.py: TensorRT_ROOT optional, drop the tensorrt install check
- requirements.txt: drop tensorrt

Verified: libtensorrt_llm.so + th_common + bindings link with no libnvinfer
NEEDED entry; import tensorrt_llm + bindings + LLM succeed with tensorrt blocked;
bindings.DataType members/values unchanged (AC-5.1 parity test added).

Signed-off-by: junq (generated by with_the_same_user script) <junq@ipp2-0160.ipp2a1.colossus.nvidia.com>
…tion

Make the unit test suite collectable after the TensorRT backend removal and
clean up remaining stale references found by an audit.

- tests/unittest/utils/util.py: drop deleted TRT imports (torch_dtype_to_trt,
  trt_dtype_to_torch, ContextFMHAType, Session/TensorInfo) and the TRT-engine
  test helpers; keep the PyTorch/GPU helpers.
- redirect test imports: functional enums -> functional_enums, quantization
  weight-preproc -> quantization.preprocessing; stub/lazy the removed TRT LLM
  symbols in mixed test files; delete tests that are entirely TensorRT-backend.
- tensorrt_llm/_torch/auto_deploy/llm_args.py: remove the vestigial build_config
  field/validator/import (BuildConfig was deleted) so the preserved AutoDeploy
  backend imports again.
- serialization.py: drop allowlist entries for deleted builder.BuildConfig and
  plugin.plugin.PluginConfig.
- runtime.pyi: drop the removed TllmRuntime stub.
- usage/llm_args_golden_manifest.json: regenerate (no TrtLlmArgs / 'tensorrt').
- cpp iTensor.h: remove the unused nvinfer1 forward declaration.

Verified: tensorrt_llm + bindings + LLM + AutoDeploy import with tensorrt
blocked; bindings.DataType parity holds; unit tests collect (the only remaining
collection errors are an unrelated missing optional 'ray' dependency).

Signed-off-by: junq (generated by with_the_same_user script) <junq@ipp2-0160.ipp2a1.colossus.nvidia.com>
Wanli-Jiang added a commit to Wanli-Jiang/TensorRT-LLM that referenced this pull request Jun 30, 2026
Remove legacy TensorRT-backend example directories and documentation pages,
add docs/source/legacy/tensorrt-backend-removal.md migration guide, and drop
TensorRT references from the surviving docs (index, release notes, torch
arch/kv-cache overviews, and two blogs).

Pure docs/examples change: no Python or C++ source is touched, so the package
build and import are unaffected. First of the incremental PRs that split
Freddy's TensorRT-removal draft (QiJune#20).

Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Wanli-Jiang added a commit to Wanli-Jiang/TensorRT-LLM that referenced this pull request Jun 30, 2026
Delete the TensorRT-only test suites (tests/unittest/trt/**, the TRT builder/
runtime/plugin/quantization/medusa/lora-vs-trt tests under others/, llmapi, and
tools/plugin_gen) and the TensorRT-engine Triton backend paths under
triton_backend/, plus the deleted triton_server integration tests.

Re-derive the test-list cleanup against current main (Freddy's draft predates
main by ~165 commits, so his hunks no longer apply): drop the now-dangling
entries from tests/integration/test_lists/test-db/*.yml and the stale TensorRT
waives, and replace qa/llm_triton_integration.txt with the llmapi-backend
triton tests. Entries were removed only where the referenced test path no
longer exists on disk, so no surviving PyTorch test is affected.

No Python or C++ source is touched. Second of the incremental PRs splitting
Freddy's TensorRT-removal draft (QiJune#20).

Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant