[SPIR-V] Add descriptor heap stride specialization constant attribute#8520
[SPIR-V] Add descriptor heap stride specialization constant attribute#8520jzakharovnv wants to merge 4 commits into
Conversation
|
@jzakharovnv please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
Contributor License AgreementContribution License AgreementThis Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”),
|
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
dd74c49 to
3c656ed
Compare
Building off of microsoft#8281, this commit adds a native lowering via SPV_EXT_descriptor_heap and SPV_KHR_untyped_pointers. ResourceDescriptorHeap and SamplerDescriptorHeap are lowered to untyped variables decorated with ResourceHeapEXT and SamplerHeapEXT. Each heap access emits OpUntypedAccessChainKHR into a runtime array of the appropriate descriptor type. Buffer-like resources (StructuredBuffer, ByteAddressBuffer, ConstantBuffer, TextureBuffer) use OpTypeBufferEXT and OpBufferPointerEXT; image and sampler resources use OpLoad. Interlocked operations on RWTexture use OpUntypedImageTexelPointerEXT. Requires -fspv-target-env=vulkan1.3. Assisted-by: Claude.
3c656ed to
083f3a2
Compare
Extends the SPV_EXT_descriptor_heap native heap lowering to cover RaytracingAccelerationStructure resources loaded from ResourceDescriptorHeap. Acceleration structure descriptors are accessed via OpUntypedAccessChainKHR into a runtime array of OpTypeAccelerationStructureKHR, consistent with the image and sampler paths added in the previous commit.
…-heap-stride CLI flags Adds two new command-line flags that override the ArrayStride of the descriptor heap runtime arrays emitted by -fspv-use-descriptor-heap. -fvk-resource-heap-stride <N> and -fvk-sampler-heap-stride <M> sets the stride for ResourceDescriptorHeap SamplerDescriptorHeap arrays respectively. N and M must be a power of two in [8, 256]. When set, the CLI value takes the highest precedence.
Adds [[vk::resource_heap_stride_constant_id(id)]] and [[vk::sampler_heap_stride_constant_id(id)]] attributes that emit the descriptor heap ArrayStride as a SPIR-V specialization constant using ArrayStrideIdEXT, allowing applications to override the stride at pipeline creation time via VkSpecializationInfo without recompiling the shader. The attribute initializer supplies the default stride value and must be a power of two in [8, 256]. The CLI flags -fvk-resource-heap-stride and -fvk-sampler-heap-stride take higher precedence and suppress these attributes with a warning when both are specified.
083f3a2 to
901ceae
Compare
|
@microsoft-github-policy-service agree company="NVIDIA" |
|
I've reviewed this as best I can, everything looks fine code wise to me in terms of using the heap extension. My only feedback is the same as I gave offline - that we really need the descriptor heap stride to be calculated via OpConstantSizeOf by default - but I know that's in hand. Noting it here in case any other readers come across it 😊 |
|
|
||
| [numthreads(1, 1, 1)] | ||
| void main(uint3 tid : SV_DispatchThreadID) { | ||
| StructuredBuffer<uint> input = ResourceDescriptorHeap[0]; |
There was a problem hiding this comment.
; SPIR-V
; Version: 1.6
; Generator: Google spiregg; 0
; Bound: 99
; Schema: 0
OpCapability DescriptorHeapEXT
OpCapability Shader
OpExtension "SPV_EXT_descriptor_heap"
OpExtension "SPV_KHR_untyped_pointers"
OpMemoryModel Logical GLSL450
OpEntryPoint GLCompute %main "main" %gl_GlobalInvocationID %outputBytes %resource_heap
OpExecutionMode %main LocalSize 1 1 1
OpSource HLSL 660
OpName %type_RWByteAddressBuffer "type.RWByteAddressBuffer"
OpName %outputBytes "outputBytes"
OpName %type_untyped_pointer "type.untyped.pointer"
OpName %resource_heap "resource_heap"
OpName %main "main"
OpName %type_buffer_ext "type.buffer.ext"
OpName %type_StructuredBuffer_uint "type.StructuredBuffer.uint"
OpName %type_RWStructuredBuffer_uint "type.RWStructuredBuffer.uint"
OpName %type_ByteAddressBuffer "type.ByteAddressBuffer"
OpName %type_buffer_ext_0 "type.buffer.ext"
OpName %type_ConstantBuffer_Constants "type.ConstantBuffer.Constants"
OpMemberName %type_ConstantBuffer_Constants 0 "value"
OpDecorate %gl_GlobalInvocationID BuiltIn GlobalInvocationId
OpDecorate %resource_heap BuiltIn ResourceHeapEXT
OpDecorate %outputBytes DescriptorSet 0
OpDecorate %outputBytes Binding 0
OpDecorate %_runtimearr_uint ArrayStride 4
OpMemberDecorate %type_RWByteAddressBuffer 0 Offset 0
OpDecorate %type_RWByteAddressBuffer Block
OpDecorate %_runtimearr_type_buffer_ext ArrayStride 64
OpMemberDecorate %type_StructuredBuffer_uint 0 Offset 0
OpMemberDecorate %type_StructuredBuffer_uint 0 NonWritable
OpDecorate %type_StructuredBuffer_uint Block
OpMemberDecorate %type_RWStructuredBuffer_uint 0 Offset 0
OpDecorate %type_RWStructuredBuffer_uint Block
OpMemberDecorate %type_ByteAddressBuffer 0 Offset 0
OpMemberDecorate %type_ByteAddressBuffer 0 NonWritable
OpDecorate %type_ByteAddressBuffer Block
OpDecorate %_runtimearr_type_buffer_ext_0 ArrayStride 64
OpMemberDecorate %type_ConstantBuffer_Constants 0 Offset 0
OpDecorate %type_ConstantBuffer_Constants Block
%uint = OpTypeInt 32 0
%uint_0 = OpConstant %uint 0
%uint_1 = OpConstant %uint 1
%uint_2 = OpConstant %uint 2
%uint_3 = OpConstant %uint 3
%int = OpTypeInt 32 1
%int_0 = OpConstant %int 0
%uint_4 = OpConstant %uint 4
%_runtimearr_uint = OpTypeRuntimeArray %uint
%type_RWByteAddressBuffer = OpTypeStruct %_runtimearr_uint
%_ptr_StorageBuffer_type_RWByteAddressBuffer = OpTypePointer StorageBuffer %type_RWByteAddressBuffer
%v3uint = OpTypeVector %uint 3
%_ptr_Input_v3uint = OpTypePointer Input %v3uint
%type_untyped_pointer = OpTypeUntypedPointerKHR UniformConstant
%void = OpTypeVoid
%28 = OpTypeFunction %void
%type_buffer_ext = OpTypeBufferEXT StorageBuffer
%_runtimearr_type_buffer_ext = OpTypeRuntimeArray %type_buffer_ext
%type_StructuredBuffer_uint = OpTypeStruct %_runtimearr_uint
%_ptr_StorageBuffer_type_StructuredBuffer_uint = OpTypePointer StorageBuffer %type_StructuredBuffer_uint
%type_RWStructuredBuffer_uint = OpTypeStruct %_runtimearr_uint
%_ptr_StorageBuffer_type_RWStructuredBuffer_uint = OpTypePointer StorageBuffer %type_RWStructuredBuffer_uint
%type_ByteAddressBuffer = OpTypeStruct %_runtimearr_uint
%_ptr_StorageBuffer_type_ByteAddressBuffer = OpTypePointer StorageBuffer %type_ByteAddressBuffer
%type_buffer_ext_0 = OpTypeBufferEXT Uniform
%_runtimearr_type_buffer_ext_0 = OpTypeRuntimeArray %type_buffer_ext_0
%type_ConstantBuffer_Constants = OpTypeStruct %uint
%_ptr_Uniform_type_ConstantBuffer_Constants = OpTypePointer Uniform %type_ConstantBuffer_Constants
%_ptr_StorageBuffer_uint = OpTypePointer StorageBuffer %uint
%_ptr_Uniform_uint = OpTypePointer Uniform %uint
%outputBytes = OpVariable %_ptr_StorageBuffer_type_RWByteAddressBuffer StorageBuffer
%gl_GlobalInvocationID = OpVariable %_ptr_Input_v3uint Input
%resource_heap = OpUntypedVariableKHR %type_untyped_pointer UniformConstant
%_ptr_Input_uint = OpTypePointer Input %uint
%main = OpFunction %void None %28
%36 = OpLabel
%37 = OpUntypedAccessChainKHR %type_untyped_pointer %_runtimearr_type_buffer_ext %resource_heap %uint_0
%38 = OpBufferPointerEXT %_ptr_StorageBuffer_type_StructuredBuffer_uint %37
%39 = OpUntypedAccessChainKHR %type_untyped_pointer %_runtimearr_type_buffer_ext %resource_heap %uint_1
%40 = OpBufferPointerEXT %_ptr_StorageBuffer_type_RWStructuredBuffer_uint %39
%41 = OpUntypedAccessChainKHR %type_untyped_pointer %_runtimearr_type_buffer_ext %resource_heap %uint_2
%42 = OpBufferPointerEXT %_ptr_StorageBuffer_type_ByteAddressBuffer %41
%43 = OpUntypedAccessChainKHR %type_untyped_pointer %_runtimearr_type_buffer_ext_0 %resource_heap %uint_3
%44 = OpBufferPointerEXT %_ptr_Uniform_type_ConstantBuffer_Constants %43
%45 = OpUntypedAccessChainKHR %type_untyped_pointer %_runtimearr_type_buffer_ext %resource_heap %uint_0
%46 = OpBufferPointerEXT %_ptr_StorageBuffer_type_StructuredBuffer_uint %45
%47 = OpAccessChain %_ptr_Input_uint %gl_GlobalInvocationID %int_0
%48 = OpLoad %uint %47
%49 = OpBitcast %int %48
%50 = OpAccessChain %_ptr_StorageBuffer_uint %46 %int_0 %49
%51 = OpLoad %uint %50
%52 = OpUntypedAccessChainKHR %type_untyped_pointer %_runtimearr_type_buffer_ext %resource_heap %uint_2
%53 = OpBufferPointerEXT %_ptr_StorageBuffer_type_ByteAddressBuffer %52
%54 = OpAccessChain %_ptr_Input_uint %gl_GlobalInvocationID %int_0
%55 = OpLoad %uint %54
%56 = OpIMul %uint %55 %uint_4
%57 = OpShiftRightLogical %uint %56 %uint_2
%58 = OpAccessChain %_ptr_StorageBuffer_uint %53 %uint_0 %57
%59 = OpLoad %uint %58
%60 = OpIAdd %uint %51 %59
%61 = OpUntypedAccessChainKHR %type_untyped_pointer %_runtimearr_type_buffer_ext_0 %resource_heap %uint_3
%62 = OpBufferPointerEXT %_ptr_Uniform_type_ConstantBuffer_Constants %61
%63 = OpAccessChain %_ptr_Uniform_uint %62 %int_0
%64 = OpLoad %uint %63
%65 = OpIAdd %uint %60 %64
%66 = OpAccessChain %_ptr_Input_uint %gl_GlobalInvocationID %int_0
%67 = OpLoad %uint %66
%68 = OpUntypedAccessChainKHR %type_untyped_pointer %_runtimearr_type_buffer_ext %resource_heap %uint_1
%69 = OpBufferPointerEXT %_ptr_StorageBuffer_type_RWStructuredBuffer_uint %68
%70 = OpAccessChain %_ptr_StorageBuffer_uint %69 %int_0 %67
OpStore %70 %65
%71 = OpAccessChain %_ptr_Input_uint %gl_GlobalInvocationID %int_0
%72 = OpLoad %uint %71
%73 = OpIMul %uint %72 %uint_4
%74 = OpShiftRightLogical %uint %73 %uint_2
%75 = OpAccessChain %_ptr_Input_uint %gl_GlobalInvocationID %int_0
%76 = OpLoad %uint %75
%77 = OpUntypedAccessChainKHR %type_untyped_pointer %_runtimearr_type_buffer_ext %resource_heap %uint_1
%78 = OpBufferPointerEXT %_ptr_StorageBuffer_type_RWStructuredBuffer_uint %77
%79 = OpAccessChain %_ptr_StorageBuffer_uint %78 %int_0 %76
%80 = OpLoad %uint %79
%81 = OpAccessChain %_ptr_StorageBuffer_uint %outputBytes %uint_0 %74
OpStore %81 %80
%82 = OpUntypedAccessChainKHR %type_untyped_pointer %_runtimearr_type_buffer_ext %resource_heap %uint_4
%83 = OpBufferPointerEXT %_ptr_StorageBuffer_type_StructuredBuffer_uint %82
%84 = OpUntypedAccessChainKHR %type_untyped_pointer %_runtimearr_type_buffer_ext %resource_heap %uint_4
%85 = OpBufferPointerEXT %_ptr_StorageBuffer_type_StructuredBuffer_uint %84
%86 = OpAccessChain %_ptr_Input_uint %gl_GlobalInvocationID %int_0
%87 = OpLoad %uint %86
%88 = OpIMul %uint %87 %uint_4
%89 = OpIAdd %uint %88 %uint_4
%90 = OpShiftRightLogical %uint %89 %uint_2
%91 = OpUntypedAccessChainKHR %type_untyped_pointer %_runtimearr_type_buffer_ext %resource_heap %uint_4
%92 = OpBufferPointerEXT %_ptr_StorageBuffer_type_StructuredBuffer_uint %91
%93 = OpAccessChain %_ptr_Input_uint %gl_GlobalInvocationID %int_0
%94 = OpLoad %uint %93
%95 = OpBitcast %int %94
%96 = OpAccessChain %_ptr_StorageBuffer_uint %92 %int_0 %95
%97 = OpLoad %uint %96
%98 = OpAccessChain %_ptr_StorageBuffer_uint %outputBytes %uint_0 %90
OpStore %98 %97
OpReturn
OpFunctionEnd
Thanks for the great PR. The overall implementation looks good and this is a very helpful addition. I really appreciate the work you put into supporting this feature.
I left a couple of questions/comments about the generated SPIR-V output, mainly around duplicated descriptor-heap access sequences and the ArrayStride vs ArrayStrideIdEXT decoration. Could you take a look when you have a chance?
Q1:
The generated SPIR-V appears to contain many unused IDs and duplicated resource-heap access sequences. For example, %37/%38, %45/%46, %52/%53, %68/%69, %77/%78, %84/%85, and %91/%92 all recreate buffer pointers from the descriptor heap, even when they refer to the same heap index and resource type. Is this expected or acceptable behavior in DXC?
There was a problem hiding this comment.
Q2:
Per to the previous Tobski's comment, Another issue is that the descriptor-heap runtime arrays are currently decorated with a literal ArrayStride 64. For example:
OpDecorate %_runtimearr_type_buffer_ext_0 ArrayStride 64
For OpTypeBufferEXT, should DXC instead emit OpConstantSizeOf to produce a stride ID, and then use OpDecorateId with ArrayStrideIdEXT? For example:
%stride = OpConstantSizeOf %uint %type_buffer_ext_0
OpDecorateId %_runtimearr_type_buffer_ext_0 ArrayStrideIdEXT %stride
This would avoid hardcoding the descriptor-heap buffer stride as 64 and derive it from the OpTypeBufferEXT element size instead. Is the current ArrayStride 64 output expected, or should DXC emit the ArrayStrideIdEXT pattern here?
Building off of #8519, this PR adds [[vk::resource_heap_stride_constant_id(id)]] and [[vk::sampler_heap_stride_constant_id(id)]] attributes that emit the descriptor heap ArrayStride as a SPIR-V specialization constant using ArrayStrideIdEXT, allowing applications to override the stride at pipeline creation time via VkSpecializationInfo without recompiling the shader. It is part 4/4 in a series.
The attribute initializer supplies the default stride value and must be a power of two in [8, 256]. The CLI flags -fvk-resource-heap-stride and -fvk-sampler-heap-stride take higher precedence and suppress these attributes with a warning when both are specified.
Assisted by an AI agent.
@dnovillo