Skip to content

fix(tensorrt): TRT 10.16.1.11 + modelopt install + run_pip quote-fix#1

Open
forkni wants to merge 2 commits into
dotsimulate:mainfrom
forkni:main
Open

fix(tensorrt): TRT 10.16.1.11 + modelopt install + run_pip quote-fix#1
forkni wants to merge 2 commits into
dotsimulate:mainfrom
forkni:main

Conversation

@forkni
Copy link
Copy Markdown

@forkni forkni commented Apr 26, 2026

Summary

Bumps the installer to align with TRT 10.16.1.11 (first Blackwell-Windows-production release; fixes the 78% FP8 perf regression in 10.12–10.13 on SM_120) and adds the missing FP8-quant install block.

  • sd_installer/tensorrt.py: bump tensorrt_cu1210.16.1.11, polygraphy0.49.26, onnx-graphsurgeon0.6.1; add FP8-quant block (nvidia-modelopt[onnx] cupy-cuda12x==13.6.0 numpy==1.26.4) previously missing — silent ImportError on fp8_quantize until first FP8 build. Re-pin onnxruntime-gpu==1.24.4 with --no-deps after modelopt's transitive downgrade. Drop shell-style quotes inside package specs (run_pip uses subprocess + .split(), so quotes become literal arg chars).
  • sd_installer/installer.py: remove torchaudio from cu128 config (not needed); minor ruff format cleanup.
  • sd_installer/verifier.py: float32_to_bfloat16 diagnostic now points to onnx-graphsurgeon==0.6.1 instead of suggesting an onnx downgrade.
  • sd_installer/{cli.py, __init__.py, __main__.py}: ruff format cleanup (blank lines, unused import, raw docstring).

Companion PR

Pairs with dotsimulate/StreamDiffusion#12 — the main library work for TRT 10.16.1.11 + FP8 quantization. The installer fix here is a strict prerequisite: the StreamDiffusionTD COMP's Installtensorrt button installs from this repo's sd_installer/tensorrt.py, so without this PR merged the button continues to install TRT 10.12 even after the main PR lands.

Test Plan

  • Fresh-venv install: confirm pip list reports tensorrt_cu12==10.16.1.11, polygraphy==0.49.26, onnx-graphsurgeon==0.6.1, nvidia-modelopt>=0.19, onnxruntime-gpu==1.24.4 (--no-deps re-pin).
  • python -c "from streamdiffusion.acceleration.tensorrt.fp8_quantize import *; print('OK')" returns OK on a fresh install (pre-fix this would have ImportError'd on modelopt until the first FP8 build).
  • All 13 verifier checks pass.

🤖 Generated with Claude Code

INTER-NYC and others added 2 commits April 23, 2026 14:39
- tensorrt.py: bump tensorrt_cu12 to 10.16.1.11, polygraphy 0.49.26,
  onnx-graphsurgeon 0.6.1; add FP8-quant block (modelopt + cupy-cuda12x
  + numpy re-lock); re-pin onnxruntime-gpu==1.24.4 with --no-deps after
  modelopt downgrade; drop shell-style quotes inside package specs
  (run_pip uses subprocess + .split(), quotes become literal arg chars).
- installer.py: remove torchaudio from cu128 config (not needed);
  minor ruff format cleanup.
- verifier.py: float32_to_bfloat16 diagnostic points to onnx-gs 0.6.1
  instead of suggesting an onnx downgrade.
- __init__.py, __main__.py, cli.py: ruff format cleanup (blank lines,
  unused import, raw docstring).
Fixes 6 CVEs patched in deps audit 2026-05-23:
- idna >=3.16 (CVE-2026-45409: punycode resource exhaustion)
- Mako >=1.3.12 (CVE-2026-44307: Windows backslash path traversal)
- urllib3 >=2.7.0 (CVE-2026-44432/44431: over-decompression, cross-origin redirect)

Added to MANUAL_PINS and installed in phase7_numpy_lock so upgrade
runs on both fresh and existing installs. Fresh pip resolves already
satisfy these floors; this ensures the minimum on partial updates.

pip and onnxruntime-gpu CVEs are handled separately:
- pip: phase1_foundation already runs --upgrade pip (gets latest)
- onnx 1.19.1: 6 CVEs deferred — 1.21.0 breaks FP8 quantization

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants