Vulkan: support Linux/Windows desktop GPUs and opt-in wheel builds#20138
Vulkan: support Linux/Windows desktop GPUs and opt-in wheel builds#20138Reubend wants to merge 1 commit into
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20138
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New Failures, 1 Unclassified FailureAs of commit 1904c5c with merge base e285edf ( NEW FAILURES - The following jobs have failed:
UNCLASSIFIED FAILURE - DrCI could not classify the following job because the workflow did not run on the merge base. The failure may be pre-existing on trunk or introduced by this PR:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
|
427011e to
23db5fc
Compare
The Vulkan backend was developed with a focus on Android GPUs. This change
makes it build and run correctly on Linux and Windows desktop discrete GPUs
(NVIDIA/AMD/Intel) and adds an opt-in path to produce pre-built Vulkan binaries,
without changing behavior on Android.
The architecture was already largely portable (volk loader, headless compute,
optional extensions gated by availability, staging-buffer transfers), so the
work here is concentrated in build portability, a few discrete-GPU correctness
fixes, and CI/packaging plumbing. Every change is Android-safe by construction:
build divergence sits behind compile-time compiler/OS guards, AOT/policy changes
are opt-in and default to current behavior, and runtime device-behavior changes
key off queried capabilities so they resolve identically on a single-GPU
Adreno/Mali device.
Suggested review order:
1. Build portability -- backends/vulkan/CMakeLists.txt (per-compiler exception
flag, submodule check), cmake/ShaderLibrary.cmake (glslc discovery, graceful
skip in wheel builds), and runtime/vk_api/memory/vma_api.h (GCC/MSVC warning
suppression alongside the existing clang block).
2. Shader compilation -- runtime/graph/ops/glsl/coopmat_mm.yaml targets Vulkan
1.3 so GL_KHR_cooperative_matrix compiles, plus a NameError-safety fix in
runtime/gen_vulkan_spv.py.
3. Runtime correctness on discrete GPUs -- vk_api/Adapter.cpp enables the
shaderInt16/Int64/Float64 features the shaders already use; vk_api/Runtime.cpp
prefers a real GPU over software/integrated devices (with an ETVK_DEVICE_INDEX
override); api/Context.cpp guards blit against compute-only queues; and
api/containers/Tensor.cpp compares the storage-buffer size in bytes.
4. Ahead-of-time -- vulkan_preprocess.py, partitioner/vulkan_partitioner.py, and
utils.py wire the previously-dropped small_texture_limits option through the
compile-spec round-trip; test/test_vulkan_compile_options.py covers it.
5. Distribution and CI, all gated behind EXECUTORCH_BUILD_VULKAN so default
wheels and other backends are untouched -- tools/cmake/preset/pybind.cmake,
setup.py, .ci/scripts/wheel/*, .ci/scripts/setup-vulkan-*.{sh,ps1},
.github/workflows/test-backend-vulkan.yml (adds a real-GPU job), and
.github/workflows/vulkan-windows.yml (MSVC build validation).
Tested by building the backend with the Vulkan SDK and running fp32 and int8
models on an NVIDIA A100: outputs match the reference and all shaders compile
(cooperative-matrix as SPIR-V 1.6). The existing SwiftShader CI path is
unchanged.
This change was authored with Claude.
23db5fc to
1904c5c
Compare
|
I took a look at the 3 test failures that the CI found, and I don't think that they're caused by any of my changes here. Please let me know if you find anything that indicates differently. |
|
ack. Taking a look soon. |
SS-JIA
left a comment
There was a problem hiding this comment.
⏺ PR #20138 — changes by theme
Theme 1: Build portability (make it compile off-Android)
- backends/vulkan/CMakeLists.txt — per-compiler exception flag (/EHsc MSVC vs -fexceptions); submodule-existence check w/ actionable error.
- backends/vulkan/cmake/ShaderLibrary.cmake — find glslc via VULKAN_SDK HINTS (Windows); graceful skip in wheel builds.
- backends/vulkan/runtime/vk_api/memory/vma_api.h — warning suppression for GCC + MSVC (was clang-only).
- backends/vulkan/runtime/gen_vulkan_spv.py — NameError guard when cache_dir is None.
Theme 2: Shader compilation
- backends/vulkan/runtime/graph/ops/glsl/coopmat_mm.yaml — target Vulkan 1.3 so GL_KHR_cooperative_matrix (needs SPIR-V 1.6) compiles.
Theme 3: Runtime correctness on desktop GPUs
- vk_api/Adapter.cpp — enable int16/int64/float64 device features shaders already use.
- vk_api/Runtime.cpp — prefer discrete GPU over software/integrated; ETVK_DEVICE_INDEX override.
- api/Context.cpp — guard blit against compute-only queues.
- api/containers/Tensor.cpp — fix buffer-size check (bytes vs numel).
Theme 4: AOT compile-option round-trip fix
- vulkan_preprocess.py — deserialize small_texture_limits + skip_memory_planning (were silently dropped); fix tuple-mutation bug.
- partitioner/vulkan_partitioner.py — wire small_texture_limits into partition texture limits.
- utils.py — add SMALL_TEXTURE_LIMITS constant.
- test/test_vulkan_compile_options.py — regression test for round-trip.
Theme 5: Packaging (opt-in wheel)
- setup.py — build vulkan_backend target when EXECUTORCH_BUILD_VULKAN.
- tools/cmake/preset/pybind.cmake — enable Vulkan in Linux/Windows wheel if flag + glslc present.
Theme 6: CI infra
- .github/workflows/test-backend-vulkan.yml — new real-GPU (NVIDIA) job.
- .github/workflows/vulkan-windows.yml — new MSVC build job.
- .ci/scripts/test_backend.sh — pick real-gpu vs swiftshader ICD.
- .ci/scripts/setup-vulkan-linux-deps.sh — arg-based SwiftShader/real-GPU setup.
- .ci/scripts/setup-vulkan-windows-deps.ps1, setup-windows-msvc-vulkan.ps1 — Windows glslc + build.
- .ci/scripts/wheel/pre_build_script.sh — provision Vulkan SDK/submodules when flag set.
- .ci/scripts/wheel/test_linux.py, test_windows.py — check VulkanBackend registered.
All changes from Theme 1 through 5 look super solid! Thanks for putting together those fixes.
For Theme 6, just have a small request regarding the new CI tests added. Overall, the PR looks great!
I'll also import this PR as a diff just so that these changes can run against our Meta internal CI flows. Note that this diff won't be landed, I will abandon and unlink it from the PR once we get the CI signals.
| // expose compute-only queue families this could otherwise be invalid usage. | ||
| // On mobile the single universal queue always has these bits set. | ||
| VK_CHECK_COND( | ||
| queue_.capabilities & (VK_QUEUE_GRAPHICS_BIT | VK_QUEUE_TRANSFER_BIT), |
There was a problem hiding this comment.
Potentially valid point raised by Claude:
Context.cpp blit guard — ⚠️ one real question. Guard allows GRAPHICS | TRANSFER. But per Vulkan spec vkCmdBlitImage requires GRAPHICS specifically — transfer queues do not support blit.
Compute+transfer-without-graphics queue passes check but blit still invalid usage. In practice selected compute queue on desktop usually has graphics, so works. Still: check should likely be
VK_QUEUE_GRAPHICS_BIT only. Strictly better than no guard (today), but the TRANSFER_BIT could let an invalid case slip silently.
If true, seems that only GRAPHICS_BIT should be checked.
| @@ -312,7 +356,7 @@ Runtime::Runtime(const RuntimeConfig config) | |||
| try { | |||
| switch (config.default_selector) { | |||
| case AdapterSelector::First: | |||
There was a problem hiding this comment.
optional, but with the updated behaviour it seems First is no longer accurate. Would recommend renaming to Auto, perhaps.
| @@ -0,0 +1,48 @@ | |||
| name: Test Vulkan Backend Windows Build | |||
There was a problem hiding this comment.
I would recommend creating a new vulkan.yml workflow file and just putting all vulkan CI jobs that require special runners there (i.e. build-vulkan-windows-msvc and test-vulkan-real-gpu, which perhaps should be renamed to test-vulkan-nvidia), for example:
# .github/workflows/vulkan.yml
name: Test Vulkan Backend (specialized runners)
on:
push:
branches: [main, release/*]
tags: [ciflow/nightly/*]
pull_request:
paths:
- .github/workflows/vulkan.yml
- backends/vulkan/**
- .ci/scripts/setup-vulkan-linux-deps.sh
- .ci/scripts/setup-vulkan-windows-deps.ps1
- .ci/scripts/setup-windows-msvc-vulkan.ps1
workflow_dispatch:
concurrency:
group:
${{ github.workflow }}-${{ github.event.pull_request.number || github.sha
}}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true
permissions:
contents: read
jobs:
changed-files:
name: Get changed files
uses: ./.github/workflows/_get-changed-files.yml
with:
include-push-diff: true # so push commits can also be path-filtered
run-decision:
name: CI run decision
uses: ./.github/workflows/_ci-run-decision.yml
test-vulkan-nvidia:
needs: [changed-files, run-decision]
# Path-filtered: skip commits that don't touch Vulkan-relevant paths,
# except on sampled full runs (see _ci-run-decision.yml).
if: |
github.event_name != 'pull_request' && (
contains(needs.changed-files.outputs.changed-files, 'backends/vulkan/') ||
contains(needs.changed-files.outputs.changed-files, '.ci/scripts/setup-vulkan-linux-deps.sh') ||
contains(needs.changed-files.outputs.changed-files, '.github/workflows/vulkan.yml') ||
needs.run-decision.outputs.is-full-run == 'true'
)
uses: pytorch/test-infra/.github/workflows/linux_job_v2.yml@main
# ... (existing real-gpu config)
build-vulkan-windows-msvc:
needs: [changed-files, run-decision]
if: |
contains(needs.changed-files.outputs.changed-files, 'backends/vulkan/') ||
contains(needs.changed-files.outputs.changed-files, '.ci/scripts/setup-vulkan-windows-deps.ps1') ||
contains(needs.changed-files.outputs.changed-files, '.ci/scripts/setup-windows-msvc-vulkan.ps1') ||
contains(needs.changed-files.outputs.changed-files, '.github/workflows/vulkan.yml') ||
needs.run-decision.outputs.is-full-run == 'true'
uses: pytorch/test-infra/.github/workflows/windows_job.yml@main
# ... (existing windows config)Also on the topic of test-vulkan-real-gpu / test-vulkan-nvidia, I think we can also consider:
- Have it replicate the workflow of both
test-vulkan-models-linuxandtest-vulkan-operators-linuxfor full coverage - Remove the
github.event_name != 'pull_request'check, since thechanged-filesmechanism +on:pull_request:pathsmechanism should drastically reduce the frequency at which this job is triggered.
|
@SS-JIA has imported this pull request. If you are a Meta employee, you can view this in D108371716. |
Summary
We already provide good desktop support for NVIDIA GPUs through our CUDA backend, which works well on both Linux and Windows. However, our Vulkan backend only provides solid support for Android, leaving AMD GPUs without sufficient support. That's a shame since Vulkan is well supported on every operating system and with every major GPU manufacturer.
This PR gets the Vulkan backend building and running correctly on Linux and Windows desktop GPUs (NVIDIA, AMD, Intel), and adds an opt-in way to build pre-built Vulkan binaries. It leaves everything Android related the same so that we don't regress anything for that platform.
Most of the backend was already portable, So this is mostly build fixes, a few small fixes for desktop GPUs, and packaging/CI plumbing.
fixes:#20140
Changes
ETVK_DEVICE_INDEXoverride for multi-GPU machines); avoids an invalid image-copy on compute-only queues; and fixes a buffer-size check that compared the wrong units.small_texture_limitsexport option work as intended. It was being silently dropped; now it round-trips so you can target GPUs with smaller texture limits. Adds a small unit test for it.EXECUTORCH_BUILD_VULKANflag, the wheel build can include Vulkan; adds a real-GPU (NVIDIA) test job and a Windows/MSVC build job. The default wheel and other backends are untouched.Android Safety
Every change is gated so Android behaves exactly as before: build differences are behind compiler/OS checks, the new export and runtime options are opt-in and default to today's behavior, and the device-selection / feature changes are based on what the GPU reports . The existing SwiftShader CI job is unchanged.
Testing
Built with the Vulkan SDK (glslc on PATH) and run on an NVIDIA A100 (driver 580.126.09, Vulkan 1.4.312):
I ran a small fp32 model and an int8 model on the A100 and matched the reference output (fp32 to 5 decimals, int8 to 4 decimals). The int8 run exercises the integer-dot-product / int16 shaders that SwiftShader can't run.
All the shaders compiled fine and the new unit tests added here passed.
I didn't test the windows build yet, so I'll be relying on CI for that.
TODO
We also need to publish a Vulkan wheel to PyPI. The build supports it (
EXECUTORCH_BUILD_VULKAN=1+ glslc), but we need a Vulkan entry added to the sharedbuild-wheels-*.ymlworkflows.cc @SS-JIA @manuelcandales @digantdesai @cbilgin