Skip to content

Comprehensive ROCm helper for centralized Windows package integration and installation support#1629

Draft
NeuralFault wants to merge 11 commits intoLykosAI:mainfrom
NeuralFault:universal-rocm
Draft

Comprehensive ROCm helper for centralized Windows package integration and installation support#1629
NeuralFault wants to merge 11 commits intoLykosAI:mainfrom
NeuralFault:universal-rocm

Conversation

@NeuralFault
Copy link
Copy Markdown
Contributor

@NeuralFault NeuralFault commented May 2, 2026

Introduces significant improvements to AMD GPU (ROCm) support and consolidates Windows ROCm support behind a shared helper and expands AMD GPU coverage across the current Windows-native ROCm path. The result is a more consistent install and launch flow for ROCm-capable packages, less duplicated package-specific logic, and brader support from Vega/GCN5 through the entire RDNA lineup. It also establishes the shared ROCm helper foundation that ComfyUI and Wan2GP now use directly, with the same model intended to be reused by other AMD/ROCm-capable packages going forward.

ROCm Support Integration and Refactoring:

  • Introduced the IRocmPackageHelper dependency to PackageFactory, ComfyUI, and Wan2GP, so ROCm compatibility checks, runtime/install context resolution, Windows-native package installation, and launch environment construction all flow through the same shared service instead of being reimplemented per package. This centralizes the Windows ROCm path and makes it easier to extend the same behavior to additional packages later. [1] [2] [3] [4] [5] [6] [7] [8]

  • Refactored the Windows ROCm path in ComfyUI around the shared helper and a package-owned WindowsRocmProfile, replacing hardcoded install/index handling with shared compatibility, install, and environment policy. This keeps the package-specific behavior limited to the pieces that actually need to remain package-specific while letting the helper own the common ROCm workflow. [1] [2] [3]

  • Updated ROCm support detection and launch environment injection to use the centralized helper throughout the package startup. This makes ROCm eligibility checks, EnVar defaults, and runtime-specific overrides consistent across packages, while still allowing package-owned extras such as ComfyUI-specific COMFYUI_ENABLE_MIOPEN. [1] [2] [3]
    This also fixes user-set EnVars in SM settings/Environment Variables not overriding the package configured EnVars due to immutability coded in comfyui.cs's original Windows-ROCm package specific handling. Environment Variable injection flow is as follows: Helper Defaults > Package config > User-set. So the user-set variables are added as last step pulling them from SettingsManager, before finally being set into the package's launch flow. Prioritizing user variables over any previously injected variable of the same key if the user wishes to disable or modify a default variable.

GPU Architecture and Compatibility Improvements:

  • Extended the AMD GPU architecture detection matrix in GpuInfo to cover a wider set of Vega/GCN5, RDNA1, RDNA2, RDNA3/3.5, related handheld/mobile variants, and adding handling support for R9600 RDNA4 Pro GPU that was previously absent. Improving Windows ROCm coverage across the supported lineup instead of limiting support to a narrower subset of cards, as TheRock ROCm Technical Preview PyTorch builds exist now for Vega dGPUs, RDNA1 dGPUs, and practically the entire RDNA2 family. From Vega 56 all the way to RX 9070/R9700. This still excludes Radeon Instinct Vega/CDNA HPC/Datacenter GPUs due no official driver support and/or needing custom hacked drivers for Windows. [1] [2]

  • Refactored ROCm compatibility checks in GpuInfo and HardwareHelper to use shared WindowsRocmSupport logic, removing duplicated support tables and consolidating both support detection and architecture-based policy decisions in one place as a single source of truth in the domain of the ROCm helper. Keeping GpuInfo and HardwareHelper specifically handling just GPU-name > gfxarch translation, and extracting of hardware information from the OS for the GPU installed on the user's system respectively.
    Simplifying further addition of future released GPU gfxarch translation, along with installation indexes and special environmental handling, to 2 dedicated files which can be updated and automatically apply globally to packages wired into the ROCm helper's call paths. [1] [2]

Other Installation and Launch Improvements:

  • Adjusted ComfyUI launch defaults to account for Windows ROCm architecture differences, including defaulting to legacy Windows ROCm GPUs to Quad Cross-Attention where that remains the better default. This keeps package launch behavior aligned with the shared architecture policy without hardcoding that policy in multiple places. [1] [2]

  • Updated package launch environment injection to use ROCm helper-generated variables and shared defaults, so ROCm-enabled installs consistently receive the expected runtime configuration on Windows while still leaving room for package-specific additions where needed. [1] [2] [3]


In Progress / Considerations for further development while still Draft PR status
  • Expand the shared Windows ROCm helper wiring into additional packages, particularly A3 WebUI and SDForge variants (Forge / reForge), so they can reuse the same compatibility checks, package selection, install flow, and launch environment defaults now used by ComfyUI and Wan2GP. Adding the expansive support provided by the ROCm helper to these most popular additional packages.
  • Improve SwarmUI Windows ROCm integration so the default ROCm launch environment is passed through when SwarmUI starts its self-launched ComfyUI backend, while also layering in user-set variables in SM Settings. This never happened previously, only user set variables were passed from SettingsManager, So variables that were hardcoded in original Win/ROCm handling in ComfyUI never got passed to the SwarmUI Comfy Self-Start backend leading to degraded performance unless the user specifically set them in SM Settings.
  • Evaluate adding Flash Attention 2 support for legacy-architecture installs through AMD AITER using custom builds by 0xDELUXA, either as a default path where stable enough or as an optional Package Command action. With consideration for gating to RDNA2 and older. (RDNA3+ gets its own FA2 via AOTriton which is enabled by default if using modern arch's)
  • Evaluate adding Sage Attention 1.0 as an optional Package Command install, including any required ops patching (sourced from ComfyUI-Zluda installation scripting needed for the Windows ROCm environment along with triton-windows. With potiential gating to RDNA2 and older. (AOTriton Flash Attn 2 in RDNA3+ is preferrable for modern arch's)
  • Consider defaulting Windows ROCm installs to the runtime-only ROCm module set rather than the full ROCm + SDK module set, with fuller SDK components installable later through Package Commands if needed. The full SDK stack modules add considerable footprint to the overall size of the venv, installing just the ROCm core runtime modules can be preferrable but needs testing to verify. Potentially making installation of the SDK modules optional if the user intends to compile/build modules needing it at any point.
  • Consider offering an optimized ROCm-aware bitsandbytes install path through the Package Command menu for packages and workflows that benefit from it. These would be prebuilt optimized wheels by 0xDELUXA that cover the entire Win-ROCm supported GPU lineup. Improves fp8 and other low quantization performance for GPUs that support it (needs verification of support variance)
Other future considerations
  • Refactoring and package integration for future Linux install support for passing the same default environment variables for better user experience and performance when using AMD GPUs in a Linux environment with Stability Matrix.
  • Expanding package integration to include other WebUI packages such as InvokeAI, AI Toolkit, Trainers, etc.
  • Refactor build index decision and handling to use new multi-arch url format from TheRock. Consolidating the ROCm/PyTorch installation URLs to a single universal index and have the gfxarch applied in the pip command for the current GPU.

NeuralFault and others added 8 commits April 21, 2026 19:02
- Add initial ROCm helper structure
- Set up ROCm helper foundation
Compile test sucessful.
- Add initial ROCm helper calls/config
- Removed pre-existing Windows ROCm blocks which will be obsolete following helper implementation
- Windows ROCm install/bootstrap logic into shared ROCm helper
- Add gfx-family mapping for Windows-native TheRock ROCm URLs
- Route ComfyUI Win Rocm installs through helper-resolved ROCm runtime, rocm-sdk, and pytorch setup
- Prevent requirements.txt from overwritting helper-installed ROCm torch packages
- Add helper-owned post-install torch verification and improve unsupported GPU failure handling
…ntime/install/environment API and simplify the ROCm profile/context models around the helper’s real responsibilities

- add a centralized Windows ROCm support map so GPU detection, architecture support checks, and package index resolution all use the same source of truth
- expand AMD architecture detection to cover additional RDNA4, Steam Deck, RDNA1, and Vega-class GPUs used by the Windows ROCm support path
- add a helper-managed Windows ROCm bootstrap flow that installs the ROCm runtime, initializes/reinitializes the SDK, aligns rocm-sdk-devel with the resolved torch build, and verifies both torch ROCm metadata and runtime availability
- centralize ROCm launch environment construction in the helper, including default MIOpen, allocator, flash-attention, and AOTriton settings plus legacy SDP fallback, RDNA1 overrides, and user env override layering
- switch ComfyUI to helper-driven Windows ROCm compatibility and launch env handling, and default legacy Windows ROCm GPUs to quad cross-attention while keeping Comfy-specific MIOpen enablement as a preset
- integrate Wan2GP with the shared Windows ROCm helper for install and launch flows, while updating its Linux ROCm path to use upstream rocm7.2 torch/vision/audio installs
- wire the ROCm helper through package construction and add focused test coverage for ROCm build/version parsing, runtime failure classification, and Windows ROCm support/index resolution
- centralize Windows ROCm architecture classification and legacy-attention fallback policy in WindowsRocmSupport
- move ComfyUI-specific MIOpen env handling out of the helper and into package-owned ROCm config
- reuse shared ROCm policy for ComfyUI quad-attention defaults and helper-managed AOTriton / math SDP / RDNA1 gates
- remove dead ROCm preset plumbing and trim unused RocmPackageProfile surface
- rename helper/package methods for clearer default-policy semantics
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request centralizes Windows ROCm support by introducing a shared IRocmPackageHelper service and associated models, refactoring ComfyUI and Wan2GP to use this new framework. The changes include expanded GPU architecture detection and standardized installation and environment configuration logic. Feedback identifies a missing dependency injection for Wan2GP in the factory and a configuration regression in ComfyUI's memory allocation settings. Further suggestions focus on optimizing the helper by reducing redundant hardware probing, avoiding unnecessary hardware refreshes, and using more appropriate exception types.

Comment thread StabilityMatrix.Core/Helper/Factory/PackageFactory.cs
Comment thread StabilityMatrix.Core/Models/Packages/ComfyUI.cs
Comment on lines +34 to +74
public RocmRuntimeContext ResolveRuntimeContext(
string installLocation,
InstalledPackage installedPackage,
RocmPackageProfile profile
)
{
_ = installLocation;
_ = installedPackage;

var compatibility = BuildCompatibilityResult(profile);
if (!compatibility.IsCompatible)
{
return new RocmRuntimeContext
{
IsSupported = false,
FailureReason = compatibility.FailureReason,
SelectedGpu = compatibility.SelectedGpu,
RuntimeGfxArch = compatibility.ResolvedGfxArch,
};
}

var supportedAmdGpus = GetAmdGpuCandidates(forceRefresh: true)
.Where(IsSupportedWindowsRocmGpu)
.ToList();

var selectedGpu =
compatibility.SelectedGpu
?? TryResolvePreferredAmdGpu(supportedAmdGpus, settingsManager.Settings.PreferredGpu)
?? supportedAmdGpus.FirstOrDefault();

var runtimeGfxArch =
compatibility.ResolvedGfxArch
?? selectedGpu?.GetAmdGfxArch()
?? GetSupportedFallbackGfxArch(supportedAmdGpus);

return new RocmRuntimeContext
{
IsSupported = true,
SelectedGpu = selectedGpu,
RuntimeGfxArch = runtimeGfxArch,
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The ResolveRuntimeContext method performs redundant hardware probing and logic that is already handled by BuildCompatibilityResult. Since BuildCompatibilityResult already resolves the SelectedGpu and RuntimeGfxArch, you can simplify this method significantly and avoid multiple expensive hardware enumerations.

    public RocmRuntimeContext ResolveRuntimeContext(
        string installLocation,
        InstalledPackage installedPackage,
        RocmPackageProfile profile
    )
    {
        _ = installLocation;
        _ = installedPackage;

        var compatibility = BuildCompatibilityResult(profile);
        return new RocmRuntimeContext
        {
            IsSupported = compatibility.IsCompatible,
            FailureReason = compatibility.FailureReason,
            SelectedGpu = compatibility.SelectedGpu,
            RuntimeGfxArch = compatibility.ResolvedGfxArch,
        };
    }

Comment on lines +78 to +103
public RocmInstallContext ResolveInstallContext(
string installLocation,
InstalledPackage installedPackage,
RocmPackageProfile profile
)
{
_ = installLocation;
_ = installedPackage;

var supportedAmdGpus = GetAmdGpuCandidates(forceRefresh: true)
.Where(IsSupportedWindowsRocmGpu)
.ToList();

var preferredGfxArch = TryResolvePreferredAmdGfxArch(
supportedAmdGpus,
settingsManager.Settings.PreferredGpu
);

var runtimeGfxArch = preferredGfxArch ?? GetSupportedFallbackGfxArch(supportedAmdGpus);
var windowsNativeIndexUrl = WindowsRocmSupport.TryGetPackageIndexUrl(runtimeGfxArch);

return new RocmInstallContext
{
RuntimeGfxArch = runtimeGfxArch,
RocmPackageIndexUrl = windowsNativeIndexUrl,
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to ResolveRuntimeContext, this method performs redundant hardware probing. You can leverage BuildCompatibilityResult to get the resolved architecture and avoid another hardware enumeration.

    public RocmInstallContext ResolveInstallContext(
        string installLocation,
        InstalledPackage installedPackage,
        RocmPackageProfile profile
    )
    {
        _ = installLocation;
        _ = installedPackage;

        var compatibility = BuildCompatibilityResult(profile);
        var windowsNativeIndexUrl = WindowsRocmSupport.TryGetPackageIndexUrl(compatibility.ResolvedGfxArch);

        return new RocmInstallContext
        {
            RuntimeGfxArch = compatibility.ResolvedGfxArch,
            RocmPackageIndexUrl = windowsNativeIndexUrl,
        };
    }

Comment thread StabilityMatrix.Core/Services/Rocm/RocmPackageHelper.cs Outdated
Comment thread StabilityMatrix.Core/Services/Rocm/RocmPackageHelper.cs Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant