Separate query and storage alignment requirements [MOD-13837]#946
Separate query and storage alignment requirements [MOD-13837]#946
Conversation
Split the single alignment hint carried by PreprocessorsContainerAbstract
into query_alignment and storage_alignment so different SIMD kernels
(e.g. SQ8 storage vs FP32 query) can declare different alignment needs.
- spaces/spaces.h: add combineAlignments() helper and document the
GetDistFunc asymmetric-types contract (alignment hint refers to the
first / storage operand only).
- spaces/IP_space.cpp, spaces/L2_space.cpp: SQ8<->FP32 and SQ8<->SQ8
dispatchers now publish per-kernel alignment hints.
- spaces/computer/preprocessors.h: PreprocessorInterface preprocess()
and preprocessForStorage() carry storage_alignment / query_alignment.
CosinePreprocessor uses combineAlignments() for the shared-buffer
path. Removed the same-size preprocess() overload from the interface.
- spaces/computer/preprocessor_container.{h,cpp}: split alignment field
into query_alignment and storage_alignment; legacy single-arg
constructors retained as wrappers. Added getQueryAlignment() /
getStorageAlignment(); getAlignment() kept as a deprecated alias
returning storage_alignment.
- index_factories/components/preprocessors_factory.h: extend
PreprocessorsContainerParams and CreatePreprocessorsContainer overloads
to accept both alignments; single-alignment overloads kept for callers
that don't need the distinction.
- vec_sim_index.h: DataBlocksContainer now built with
getStorageAlignment(); added getQueryAlignment() / getStorageAlignment()
accessors.
- tests/unit/test_components.cpp, tests/unit/test_hnsw_tiered.cpp:
updated mock preprocessors and fixtures to the new interface.
- tests/unit/test_spaces.cpp: SelfDistanceCosine / SelfDistanceL2 for
SQ8<->FP32 now disable AVX2_FMA / AVX2 / SSE4 before the
no-optimization assertion, mirroring the rest of the spaces tests.
…ests QuantPreprocessor::preprocessForStorage() now allocates the quantized storage blob with allocate_aligned() when storage_alignment != 0, so SQ8 storage actually honors the per-kernel alignment hint plumbed through PreprocessorsContainerAbstract. New tests: - PreprocessorsTest.QuantizationAsymmetricAlignment: drives the quant preprocessor with query_alignment=32 and storage_alignment=16 and asserts that preprocess(), preprocessForStorage(), and preprocessQuery() each return a pointer aligned to its respective hint. - SpacesTest.SQ8_FP32_DispatcherAlignmentHints / SQ8_SQ8_DispatcherAlignmentHints: assert the exact alignment-hint values published by the x86 SQ8 dispatchers (AVX512 -> 16/32, AVX2_FMA -> 8, AVX2 -> 8, SSE4 -> 4), plus the no-optimization 0-case, for IP / L2 / Cosine.
VecSimAllocator::allocate_aligned(size, 0) already falls through to allocate(size), so the helper's alignment ternary was redundant. Inline the four call sites to a direct allocate_aligned() call.
Remove the temporary getAlignment() forwarders from PreprocessorsContainerAbstract and VecSimIndexAbstract. All callers in test_bruteforce, test_hnsw_tiered, and test_allocator now use the explicit getStorageAlignment() accessor, which is what every site already meant (DataBlock allocation overhead, brute-force storage, homogeneous-alignment fixture).
🛡️ Jit Security Scan Results✅ No security findings were detected in this PR
Security scan by Jit
|
There was a problem hiding this comment.
Pull request overview
This PR threads separate query/storage alignment hints through the preprocessor pipeline so asymmetric kernels can request different SIMD alignments for stored vectors and queries. It updates the core preprocessor/container interfaces, publishes new SQ8 dispatcher hints, and adjusts unit tests to cover the new alignment behavior.
Changes:
- Split the old single alignment hint into
query_alignmentandstorage_alignmentacross preprocessors, containers, factories, and index accessors. - Updated cosine/quantization preprocessing paths and x86 SQ8 dispatchers to use the new alignment semantics.
- Refreshed unit tests to validate asymmetric quantization alignment and dispatcher hint values.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
tests/unit/test_spaces.cpp |
Adds SQ8 self-distance coverage for more x86 feature branches and new dispatcher-alignment hint tests. |
tests/unit/test_hnsw_tiered.cpp |
Updates test preprocessor overrides to the new alignment-aware interface and accessor. |
tests/unit/test_components.cpp |
Refactors dummy preprocessors/tests for separate query vs. storage alignment and adds asymmetric quantization coverage. |
tests/unit/test_bruteforce.cpp |
Switches alignment sanity test to the new storage-specific accessor. |
tests/unit/test_allocator.cpp |
Updates memory-accounting expectations to use storage alignment instead of the removed generic accessor. |
src/VecSim/vec_sim_index.h |
Replaces the old alignment accessor with query/storage variants and uses storage alignment for vector block allocation. |
src/VecSim/spaces/spaces.h |
Documents asymmetric GetDistFunc alignment semantics and adds combineAlignments. |
src/VecSim/spaces/computer/preprocessors.h |
Changes the preprocessor interface to accept separate query/storage alignments and updates cosine/quantization implementations. |
src/VecSim/spaces/computer/preprocessor_container.h |
Stores separate query/storage alignment fields and propagates them through multi-preprocessor execution. |
src/VecSim/spaces/computer/preprocessor_container.cpp |
Makes the default aligned-copy helper use query alignment specifically. |
src/VecSim/spaces/L2_space.cpp |
Publishes alignment hints for SQ8 L2 dispatchers on x86. |
src/VecSim/spaces/IP_space.cpp |
Publishes alignment hints for SQ8 IP/Cosine dispatchers on x86. |
src/VecSim/index_factories/components/preprocessors_factory.h |
Extends preprocessor factory params/overloads to carry separate query and storage alignment hints. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Alignment hints below refer to the SQ8 (first) operand per the GetDistFunc contract. | ||
| #ifdef OPT_AVX512_F_BW_VL_VNNI | ||
| if (features.avx512f && features.avx512bw && features.avx512vnni) { | ||
| if (dim % 16 == 0) // SQ8 chunk = 16 bytes | ||
| *alignment = 16 * sizeof(uint8_t); | ||
| return Choose_SQ8_FP32_IP_implementation_AVX512F_BW_VL_VNNI(dim); |
There was a problem hiding this comment.
This matches the long-standing convention used by every existing dispatcher in IP_space.cpp / L2_space.cpp (FP32, FP64, BF16, FP16, INT8, UINT8). E.g. IP_FP32_GetDistFunc writes *alignment only inside the optimized branches and leaves it untouched on the scalar fallback. The contract — honored by the single production caller CreateIndexComponents — is that the caller initializes *alignment = 0 before invoking GetDistFunc. Flipping that to "fallback unconditionally writes 0" would have to be done across every *_GetDistFunc for consistency, which is out of scope for MOD-13837.
| // Alignment hints below refer to the SQ8 (first) operand per the GetDistFunc contract. | ||
| #ifdef OPT_AVX512_F_BW_VL_VNNI | ||
| if (features.avx512f && features.avx512bw && features.avx512vnni) { | ||
| if (dim % 16 == 0) // SQ8 chunk = 16 bytes | ||
| *alignment = 16 * sizeof(uint8_t); | ||
| return Choose_SQ8_FP32_Cosine_implementation_AVX512F_BW_VL_VNNI(dim); |
There was a problem hiding this comment.
Same convention as the rest of the dispatcher layer (see e.g. IP_FP32_GetDistFunc in this file): *alignment is written only on optimized branches; the scalar fallback leaves it untouched. The caller (CreateIndexComponents) zero-initializes the output before calling. Diverging here would be inconsistent with FP32 / FP64 / BF16 / FP16 / INT8 / UINT8; flipping the contract globally is a separate refactor.
| // AVX512 VNNI SQ8_SQ8 uses 64-element chunks; residual handling is in 32-byte sub-chunks. | ||
| if (dim >= 64 && features.avx512f && features.avx512bw && features.avx512vnni) { | ||
| if (dim % 32 == 0) // align to 256 bits when there is no offsetting residual | ||
| *alignment = 32 * sizeof(uint8_t); | ||
| return Choose_SQ8_SQ8_Cosine_implementation_AVX512F_BW_VL_VNNI(dim); |
There was a problem hiding this comment.
Same convention as the rest of the dispatcher layer (see e.g. IP_FP32_GetDistFunc in this file): *alignment is written only on optimized branches; the scalar fallback leaves it untouched. The caller (CreateIndexComponents) zero-initializes the output before calling. Diverging here would be inconsistent with FP32 / FP64 / BF16 / FP16 / INT8 / UINT8; flipping the contract globally is a separate refactor.
| // Alignment hints below refer to the SQ8 (first) operand per the GetDistFunc contract. | ||
| #ifdef OPT_AVX512_F_BW_VL_VNNI | ||
| if (features.avx512f && features.avx512bw && features.avx512vnni) { | ||
| if (dim % 16 == 0) // SQ8 chunk = 16 bytes; no point in aligning if there's a residual | ||
| *alignment = 16 * sizeof(uint8_t); | ||
| return Choose_SQ8_FP32_L2_implementation_AVX512F_BW_VL_VNNI(dim); |
There was a problem hiding this comment.
Same convention as the rest of the dispatcher layer (see e.g. L2_FP32_GetDistFunc in this file): *alignment is written only on optimized branches; the scalar fallback leaves it untouched. The caller (CreateIndexComponents) zero-initializes the output before calling. Diverging here would be inconsistent with FP32 / FP64 / BF16 / FP16 / INT8 / UINT8; flipping the contract globally is a separate refactor.
| // No-optimization path must leave the hint at 0. | ||
| unsigned char alignment = 0; | ||
| (void)get(dim, &alignment, &opt); | ||
| ASSERT_EQ(alignment, 0u) << kind << ": no-optimization hint should be 0"; |
There was a problem hiding this comment.
The test asserts the documented contract: caller initializes *alignment = 0, fallback leaves it untouched. Seeding a non-zero sentinel would test a stricter contract that no *_GetDistFunc in this codebase implements — FP32, FP64, BF16, FP16, INT8, and UINT8 all leave the output untouched on the scalar fallback. Aligning the SQ8 tests to that same convention is intentional.
| static inline unsigned char combineAlignments(unsigned char a, unsigned char b) { | ||
| assert((a == 0 || (a & (a - 1)) == 0) && "alignment must be a power of two or zero"); | ||
| assert((b == 0 || (b & (b - 1)) == 0) && "alignment must be a power of two or zero"); | ||
| return a > b ? a : b; |
| // AVX512 VNNI SQ8_SQ8 uses 64-element chunks; residual handling is in 32-byte sub-chunks. | ||
| if (dim >= 64 && features.avx512f && features.avx512bw && features.avx512vnni) { | ||
| if (dim % 32 == 0) // align to 256 bits when there is no offsetting residual | ||
| *alignment = 32 * sizeof(uint8_t); | ||
| return Choose_SQ8_SQ8_IP_implementation_AVX512F_BW_VL_VNNI(dim); |
There was a problem hiding this comment.
Same convention as the rest of the dispatcher layer (see e.g. IP_FP32_GetDistFunc in this file): *alignment is written only on optimized branches; the scalar fallback leaves it untouched. The caller (CreateIndexComponents) zero-initializes the output before calling. Diverging here would be inconsistent with FP32 / FP64 / BF16 / FP16 / INT8 / UINT8; flipping the contract globally is a separate refactor.
| // AVX512 VNNI SQ8_SQ8 uses 64-element chunks; residual handling is in 32-byte sub-chunks. | ||
| if (dim >= 64 && features.avx512f && features.avx512bw && features.avx512vnni) { | ||
| if (dim % 32 == 0) // align to 256 bits when there is no offsetting residual | ||
| *alignment = 32 * sizeof(uint8_t); | ||
| return Choose_SQ8_SQ8_L2_implementation_AVX512F_BW_VL_VNNI(dim); |
There was a problem hiding this comment.
Same convention as the rest of the dispatcher layer (see e.g. L2_FP32_GetDistFunc in this file): *alignment is written only on optimized branches; the scalar fallback leaves it untouched. The caller (CreateIndexComponents) zero-initializes the output before calling. Diverging here would be inconsistent with FP32 / FP64 / BF16 / FP16 / INT8 / UINT8; flipping the contract globally is a separate refactor.
| // No-optimization path must leave the hint at 0. | ||
| unsigned char alignment = 0; | ||
| (void)get(dim, &alignment, &opt); | ||
| ASSERT_EQ(alignment, 0u) << kind << ": no-optimization hint should be 0"; |
There was a problem hiding this comment.
Same response as the SQ8_FP32 test above: the assertion matches the documented contract (caller initializes *alignment = 0, fallback leaves it untouched), which is what every other *_GetDistFunc in the codebase implements. A non-zero sentinel would test a stricter contract that doesn't exist anywhere in the dispatcher layer.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit b9f8af9. Configure here.
| blob = this->allocator->allocate(storage_bytes_count); | ||
| blob = storage_alignment | ||
| ? this->allocator->allocate_aligned(storage_bytes_count, storage_alignment) | ||
| : this->allocator->allocate(storage_bytes_count); |
There was a problem hiding this comment.
Inconsistent ternary kept in preprocessForStorage allocation
Low Severity
QuantPreprocessor::preprocessForStorage retains the storage_alignment ? allocate_aligned(...) : allocate(...) ternary, while the PR description explicitly states that allocate_aligned(size, 0) is equivalent to allocate(size) and the ternary can be dropped. Within the same class, preprocessQuery already calls allocate_aligned unconditionally, and CosinePreprocessor::preprocessForStorage also drops the ternary. This inconsistency within the same class is confusing and contradicts the stated intent of the change.
Reviewed by Cursor Bugbot for commit b9f8af9. Configure here.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #946 +/- ##
==========================================
+ Coverage 96.71% 96.96% +0.25%
==========================================
Files 129 130 +1
Lines 8057 7785 -272
==========================================
- Hits 7792 7549 -243
+ Misses 265 236 -29 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|


Describe the changes in the pull request
Split the single
alignmenthint carried by the preprocessor stack into two independent values,query_alignmentandstorage_alignment, so that storage and query buffers can satisfy different SIMD alignment requirements. This is the prerequisite plumbing for asymmetric kernels such as SQ8↔FP32, where the storage operand (e.g. SQ8 → 16 B) and the query operand (e.g. FP32 → 32 B) have different preferences.Key changes:
PreprocessorInterface::preprocessnow takes bothstorage_alignmentandquery_alignment. The legacy single-alignment overload at the lowest layer is removed; the same-blob path is decided by the concrete preprocessor (CosinePreprocessorshares,QuantPreprocessordoes not).PreprocessorsContainerAbstract/MultiPreprocessorsContainerstore both fields and exposegetQueryAlignment()/getStorageAlignment(). The deprecatedgetAlignment()alias has been removed; all callers now explicitly pick one.CreatePreprocessorsContainerandCreatePreprocessorsContainerParamsget an asymmetric overload that takes both alignments; the existing single-alignment overload mirrors the value into both fields, so all current production callers are unchanged behaviorally.CosinePreprocessorallocates its shared buffer withspaces::combineAlignments(storage_alignment, query_alignment)(the strictest power-of-two that satisfies both consumers).QuantPreprocessor::preprocessForStoragehonorsstorage_alignmentwhen allocating the quantized blob.GetDistFuncis documented inspaces.h: the returned hint refers to the first operand only; callers that need both must query both dispatchers and combine.IP_space.cpp,L2_space.cpp) now publish per-feature alignment hints (AVX512 → 16 / 32, AVX2_FMA → 8, AVX2 → 8, SSE4 → 4, no-opt → 0) for both the SQ8↔SQ8 and SQ8↔FP32 paths.VecSimAllocator::allocate_aligned(size, 0)is documented/exercised as equivalent toallocate(size), allowing the preprocessor allocation sites to drop thealignment ? aligned : plainternary helper and call the aligned variant unconditionally.No in-tree production caller currently produces asymmetric alignments:
CreateIndexComponentsis templated on a singleDataType, queries one dispatcher, and forwards the same hint to both fields. The asymmetric path is reachable today only by callers that build the components manually (or by the disk-backed quantized index in the follow-upvecsim_diskrepo, wherequery_alignment = fullAlignment; storage_alignment = combineAlignments(quantizedAlignment, mixedAlignment)).Test changes:
PreprocessorsTest.QuantizationAsymmetricAlignment(new) exercisesquery_alignment=32,storage_alignment=16throughpreprocess,preprocessForStorage, andpreprocessQuery, asserting independent pointer alignment for both blobs.SpacesTest.SQ8_FP32_DispatcherAlignmentHintsandSpacesTest.SQ8_SQ8_DispatcherAlignmentHints(new) directly assert the alignment hint published by the IP / L2 / Cosine dispatchers per CPU feature, including the no-optimization 0-case.alignment = 5were bumped toalignment = 8so they satisfy the newcombineAlignmentspower-of-two invariant.test_allocator,test_bruteforce, andtest_hnsw_tieredwere migrated fromgetAlignment()togetStorageAlignment()(every site was already storage-semantic).ctest -j 4sweep: 2150 / 2150 passing.Which issues this PR fixes
Main objects this PR modified
PreprocessorInterface/CosinePreprocessor/QuantPreprocessor(src/VecSim/spaces/computer/preprocessors.h)PreprocessorsContainerAbstract/MultiPreprocessorsContainer(src/VecSim/spaces/computer/preprocessor_container.{h,cpp})CreatePreprocessorsContainer/CreatePreprocessorsContainerParams(src/VecSim/index_factories/components/preprocessors_factory.h)VecSimIndexAbstractaccessors (src/VecSim/vec_sim_index.h)src/VecSim/spaces/IP_space.cppandsrc/VecSim/spaces/L2_space.cppspaces::combineAlignmentsand the asymmetric-types contract documentation (src/VecSim/spaces/spaces.h)Mark if applicable
Note
Medium Risk
Touches core preprocessing/allocation paths and SIMD-dispatcher alignment hints, so mis-propagated alignment could cause performance regressions or subtle crashes on optimized kernels. Changes are mostly plumbing with added tests to guard behavior.
Overview
Adds separate alignment hints for query vs stored vectors across the preprocessor stack, replacing the single
alignmentvalue withquery_alignmentandstorage_alignment(with backward-compatible single-alignment overloads).Updates preprocessors/containers to propagate and honor the correct alignment per allocation path (including shared-buffer cases via
spaces::combineAlignments), switches index storage allocation to use the storage alignment explicitly, and extends SQ8 distance dispatchers to publish per-ISA alignment hints for the SQ8 (first/storage) operand.Tests are adjusted to use power-of-two alignments, callers are migrated from
getAlignment()togetStorageAlignment(), and new unit tests assert asymmetric alignment behavior and exact dispatcher hint values.Reviewed by Cursor Bugbot for commit b9f8af9. Bugbot is set up for automated code reviews on this repo. Configure here.