Add SQ8-FP16 scalar kernels, dispatchers, tests and benchmarks [MOD-15141] by dor-forer · Pull Request #942 · RedisAI/VectorSimilarity

dor-forer · 2026-04-29T11:25:31Z

Describe the changes in the pull request

Adds asymmetric SQ8-to-FP16 distance support (scalar path only) for L2, IP, and Cosine, where stored vectors are SQ8-quantized (uint8 + FP32 metadata) and queries are FP16. This is the first step (P1a) of MOD-15141; SIMD kernels (P1b/P1c) and QuantPreprocessor FP16 input support (P1a-2) are intentionally out of scope and tracked as follow-ups.

The scalar kernels use the algebraic identity dist = ||x||² + ||y||² - 2 * IP(x, y) with precomputed FP32 sums-of-squares stored alongside both the SQ8 storage and FP16 query, which keeps the quantization-error path consistent between the kernel and the test baseline. Each FP16 query element is widened to FP32 during accumulation; SQ8 metadata (min, delta, x_sum, x_sum_squares) and FP16 query metadata (y_sum, y_sum_squares) are read as FP32.

The new GetDistFunc<sq8, float, float16> specialization routes through IP_SQ8_FP16_GetDistFunc, Cosine_SQ8_FP16_GetDistFunc, and L2_SQ8_FP16_GetDistFunc. These dispatchers currently always return the scalar implementation; SIMD chooser slots are reserved for P1b (MOD-15152) and P1c (MOD-15153).

Which issues this PR fixes

MOD-15141

Main objects this PR modified

src/VecSim/spaces/IP/IP.{h,cpp} — adds SQ8_FP16_InnerProduct_Impl, SQ8_FP16_InnerProduct, and SQ8_FP16_Cosine.
src/VecSim/spaces/L2/L2.{h,cpp} — adds SQ8_FP16_L2Sqr using the algebraic identity over the precomputed sums-of-squares.
src/VecSim/spaces/IP_space.{h,cpp} — adds IP_SQ8_FP16_GetDistFunc and Cosine_SQ8_FP16_GetDistFunc dispatchers.
src/VecSim/spaces/L2_space.{h,cpp} — adds L2_SQ8_FP16_GetDistFunc dispatcher.
src/VecSim/spaces/spaces.cpp — adds the GetDistFunc<vecsim_types::sq8, float, vecsim_types::float16> specialization wiring L2/IP/Cosine to the new dispatchers.
tests/utils/tests_utils.h — adds populate_sq8_fp16_query and SQ8_FP16_NotOptimized_{InnerProduct,Cosine,L2Sqr} baselines (the L2 baseline mirrors the kernel's algebraic identity to avoid quantization-error drift at high dim).
tests/unit/test_spaces.cpp — adds SQ8_FP16_NoOptimizationSpacesTest parameterized over 19 dimensions (including odd and SIMD-boundary residues: 1, 5, 7, 8, 9, 15, 16, 17, 31, 32, 33, 47, 48, 49, 63, 64, 65, 127, 128) for L2/IP/Cosine, plus SQ8_FP16_EdgeCases (zero query, constant storage, mixed-sign), SpacesTest.GetDistFuncSQ8FP16Asymmetric, and SpacesTest.GetDistFuncInvalidMetricSQ8ToFP16.
tests/benchmark/spaces_benchmarks/bm_spaces_sq8_fp16.cpp (new) + tests/benchmark/CMakeLists.txt — registers naive scalar benchmarks for L2/IP/Cosine; SIMD benchmark blocks will be added by P1b/P1c when the chooser symbols become available.

Test results

All 1096 test_spaces tests pass locally.
bm_spaces_sq8_fp16 builds and runs successfully.

Follow-ups (separate stories, out of scope here)

MOD-15152 (P1b) — x86 SIMD kernels (AVX2 / AVX512 / AVX512_VNNI / SSE4) and benchmark wiring.
MOD-15153 (P1c) — ARM SIMD kernels (NEON / SVE / SVE2) and benchmark wiring.
MOD-15141 P1a-2 — QuantPreprocessor FP16 input support (no plumbing into the index types yet).

Mark if applicable

This PR introduces API changes
This PR introduces serialization changes

Note

Medium Risk
Touches core distance computation and dispatch for a new datatype combination; primary risk is subtle correctness/UB issues around mixed-type layouts and unaligned metadata reads, mitigated by added load_unaligned and broad test coverage.

Overview
Adds asymmetric SQ8 storage + FP16 query support for IP, Cosine, and L2 by introducing new scalar kernels (SQ8_FP16_InnerProduct_Impl/SQ8_FP16_InnerProduct/SQ8_FP16_Cosine and SQ8_FP16_L2Sqr) that widen FP16 elements to FP32 and use precomputed SQ8/query metadata.

Wires the new mode into the public dispatch path via GetDistFunc<sq8, float, float16> and new *_SQ8_FP16_GetDistFunc helpers (currently always returning scalar implementations), and adds load_unaligned to safely read trailing FP32 metadata for odd dimensions.

Extends validation and performance tooling with a new bm_spaces_sq8_fp16 benchmark target plus unit tests (including odd-dimension unaligned-metadata and edge cases) and new SQ8-FP16 baseline/query-population utilities in tests_utils.h.

^{Reviewed by Cursor Bugbot for commit 304450d. Bugbot is set up for automated code reviews on this repo. Configure here.}

jit-ci · 2026-04-29T11:27:10Z

🛡️ Jit Security Scan Results

✅ No security findings were detected in this PR

^{Security scan by Jit}

jit-ci · 2026-04-29T11:30:03Z

❌ Jit Scanner failed - Our team is investigating

Jit Scanner failed - Our team has been notified and is working to resolve the issue. Please contact support if you have any questions.

💡 Need to bypass this check? Comment @sera bypass to override.

Copilot

Pull request overview

Adds asymmetric SQ8-storage → FP16-query distance support (scalar path) for L2, Inner Product, and Cosine in VecSim, including dispatch plumbing plus unit tests and benchmarks.

Changes:

Implement scalar SQ8↔FP16 kernels for IP/Cosine and L2² using stored/query FP32 metadata (sum / sum-of-squares) and algebraic identities.
Add SQ8→FP16 GetDistFunc specialization and metric-specific dispatchers (currently scalar-only; SIMD chooser slots reserved).
Add test utilities, unit tests (including edge cases + parameterized dimensions), and a new benchmark target for SQ8→FP16 spaces.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`src/VecSim/spaces/IP/IP.cpp`	Adds SQ8→FP16 inner-product implementation used by IP/Cosine/L2.
`src/VecSim/spaces/IP/IP.h`	Declares SQ8→FP16 IP/Cosine APIs and shared IP implementation.
`src/VecSim/spaces/L2/L2.cpp`	Adds SQ8→FP16 L2² implementation using precomputed norms + IP.
`src/VecSim/spaces/L2/L2.h`	Declares SQ8→FP16 L2² API.
`src/VecSim/spaces/IP_space.cpp`	Adds SQ8→FP16 dispatcher stubs returning scalar implementations.
`src/VecSim/spaces/IP_space.h`	Declares SQ8→FP16 IP/Cosine dispatcher APIs.
`src/VecSim/spaces/L2_space.cpp`	Adds SQ8→FP16 L2 dispatcher stub returning scalar implementation.
`src/VecSim/spaces/L2_space.h`	Declares SQ8→FP16 L2 dispatcher API.
`src/VecSim/spaces/spaces.cpp`	Wires `GetDistFunc<sq8, float, float16>` to the new dispatchers.
`tests/utils/tests_utils.h`	Adds SQ8→FP16 query population/preprocess helpers and baselines.
`tests/unit/test_spaces.cpp`	Adds SQ8→FP16 correctness tests (basic, getters, param dims, edge cases).
`tests/benchmark/spaces_benchmarks/bm_spaces_sq8_fp16.cpp`	Introduces SQ8→FP16 scalar benchmarks for L2/IP/Cosine.
`tests/benchmark/CMakeLists.txt`	Registers the new `bm_spaces_sq8_fp16` benchmark target.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 68703d4. Configure here.}

codecov · 2026-04-29T11:47:42Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.93%. Comparing base (5bcc53e) to head (304450d).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #942      +/-   ##
==========================================
+ Coverage   96.71%   96.93%   +0.22%     
==========================================
  Files         129      130       +1     
  Lines        8057     7712     -345     
==========================================
- Hits         7792     7476     -316     
+ Misses        265      236      -29

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…mentations

…nchmark setup

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…ccess - Add load_unaligned<T> helper to VecSim/utils/alignment.h. - Refactor SQ8_FP16_InnerProduct_Impl and SQ8_FP16_L2Sqr to use it for the misaligned FP32 metadata reads, replacing inline std::memcpy. - Apply the same helper in tests/utils/tests_utils.h reference helpers (SQ8_FP16_NotOptimized_InnerProduct, SQ8_FP16_NotOptimized_L2Sqr), fixing the same alignment UB pattern that remained in the test path. - Add SQ8_FP16_l2sqr_odd_dim_unaligned_metadata_test that exercises the L2 kernel with deterministically misaligned storage and query metadata addresses (asserts the metadata bytes are not 4-byte aligned before invoking the kernel).

…6 implementations

Copilot

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Address review comments on SQ8_FP16 test alignment: - Allocate FP16 query buffers in test_spaces.cpp / bm_spaces_sq8_fp16.cpp as std::vector<float16> / new float16[] (with extra slots for FP32 metadata) so the production SQ8_FP16_* kernels' typed float16* loads never see a misaligned pointer. - Harden the test-only helpers (populate/preprocess_sq8_fp16_query and SQ8_FP16_NotOptimized_*) to access float16 values via memcpy on uint16_t, so they remain safe under any caller alignment.

…ction

lerman25 · 2026-04-30T17:06:44Z

+template <typename T>
+static inline T load_unaligned(const void *ptr) {
+    static_assert(std::is_trivially_copyable_v<T>,
+                  "load_unaligned requires a trivially-copyable T");
+    T value;
+    std::memcpy(&value, ptr, sizeof(T));
+    return value;
+}


Same alignment concern as #discussion_r3161710275, but on the existing SQ8_FP32 / SQ8_SQ8 kernels — they read FP32 metadata via a direct reinterpret_cast<const float*> and have the same misalignment for odd dim. Can we either apply load_unaligned there too in this PR, or open a follow-up so we don't end up with two conventions for the same problem in adjacent functions?

You're right.
This is not in the scope of this task - I opened a ticket for this:
https://redislabs.atlassian.net/browse/MOD-15303

lerman25

Good job!

MOD-15141 Add SQ8-FP16 scalar kernels, dispatchers, tests and benchmarks

a617e59

dor-forer changed the title ~~MOD-15141 Add SQ8-FP16 scalar kernels, dispatchers, tests and benchmarks~~ Add SQ8-FP16 scalar kernels, dispatchers, tests and benchmarks [MOD-15141] Apr 29, 2026

dor-forer requested a review from Copilot April 29, 2026 11:26

Copilot started reviewing on behalf of dor-forer April 29, 2026 11:27 View session

refactor: Clean up code formatting in IP and test utility files

68703d4

dor-forer marked this pull request as ready for review April 29, 2026 11:28

Copilot AI reviewed Apr 29, 2026

View reviewed changes

Comment thread src/VecSim/spaces/IP/IP.cpp Outdated

Comment thread src/VecSim/spaces/L2/L2.cpp Outdated

Comment thread tests/utils/tests_utils.h Outdated

Comment thread tests/unit/test_spaces.cpp

Comment thread tests/benchmark/spaces_benchmarks/bm_spaces_sq8_fp16.cpp Outdated

cursor Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread tests/benchmark/spaces_benchmarks/bm_spaces_sq8_fp16.cpp Outdated

dor-forer added 2 commits April 29, 2026 16:03

fix: Use memcpy for aligned access to FP32 metadata in SQ8-FP16 imple…

5d86035

…mentations

refactor: Correct variable types and memory allocation in SQ8-FP16 be…

e3e55b0

…nchmark setup

dor-forer requested a review from Copilot April 29, 2026 13:13

Copilot started reviewing on behalf of dor-forer April 29, 2026 13:14 View session

Copilot AI reviewed Apr 29, 2026

View reviewed changes

Comment thread src/VecSim/spaces/IP/IP.cpp Outdated

Comment thread src/VecSim/spaces/L2/L2.cpp Outdated

Comment thread tests/utils/tests_utils.h Outdated

dor-forer requested a review from Copilot April 29, 2026 14:14

Copilot started reviewing on behalf of dor-forer April 29, 2026 14:14 View session

fix: Use memcpy for safe access to quantization parameters in SQ8-FP1…

b72c5ec

…6 implementations

dor-forer requested a review from lerman25 April 29, 2026 14:19

Copilot AI reviewed Apr 29, 2026

View reviewed changes

Comment thread tests/utils/tests_utils.h

Comment thread tests/unit/test_spaces.cpp Outdated

dor-forer added 2 commits April 29, 2026 17:45

fix: format code for better readability in SQ8_FP16 inner product fun…

304450d

…ction

dor-forer mentioned this pull request Apr 30, 2026

Quant preprocessor fp16 [MOD-15269] #944

Open

2 tasks

lerman25 reviewed Apr 30, 2026

View reviewed changes

dor-forer requested a review from lerman25 May 3, 2026 07:10

dor-forer enabled auto-merge May 3, 2026 07:32

lerman25 approved these changes May 3, 2026

View reviewed changes

dor-forer added this pull request to the merge queue May 3, 2026

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 3, 2026

dor-forer added this pull request to the merge queue May 3, 2026

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 3, 2026

dor-forer added this pull request to the merge queue May 3, 2026

dor-forer removed this pull request from the merge queue due to a manual request May 3, 2026

dor-forer added this pull request to the merge queue May 3, 2026

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 3, 2026

dor-forer added this pull request to the merge queue May 4, 2026

Merged via the queue into main with commit 8c5791f May 4, 2026
17 checks passed

dor-forer deleted the feat/MOD-15141-sq8-fp16-scalar-distance branch May 4, 2026 07:27

Conversation

dor-forer commented Apr 29, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jit-ci Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🛡️ Jit Security Scan Results

Uh oh!

jit-ci Bot commented Apr 29, 2026

❌ Jit Scanner failed - Our team is investigating

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

lerman25 Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

dor-forer May 3, 2026

Choose a reason for hiding this comment

Uh oh!

lerman25 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dor-forer commented Apr 29, 2026 •

edited by cursor Bot

Loading

jit-ci Bot commented Apr 29, 2026 •

edited

Loading

codecov Bot commented Apr 29, 2026 •

edited

Loading