Skip to content

Add SQ8-FP16 scalar kernels, dispatchers, tests and benchmarks [MOD-15141]#942

Merged
dor-forer merged 8 commits intomainfrom
feat/MOD-15141-sq8-fp16-scalar-distance
May 4, 2026
Merged

Add SQ8-FP16 scalar kernels, dispatchers, tests and benchmarks [MOD-15141]#942
dor-forer merged 8 commits intomainfrom
feat/MOD-15141-sq8-fp16-scalar-distance

Conversation

@dor-forer
Copy link
Copy Markdown
Collaborator

@dor-forer dor-forer commented Apr 29, 2026

Describe the changes in the pull request

Adds asymmetric SQ8-to-FP16 distance support (scalar path only) for L2, IP, and Cosine, where stored vectors are SQ8-quantized (uint8 + FP32 metadata) and queries are FP16. This is the first step (P1a) of MOD-15141; SIMD kernels (P1b/P1c) and QuantPreprocessor FP16 input support (P1a-2) are intentionally out of scope and tracked as follow-ups.

The scalar kernels use the algebraic identity dist = ||x||² + ||y||² - 2 * IP(x, y) with precomputed FP32 sums-of-squares stored alongside both the SQ8 storage and FP16 query, which keeps the quantization-error path consistent between the kernel and the test baseline. Each FP16 query element is widened to FP32 during accumulation; SQ8 metadata (min, delta, x_sum, x_sum_squares) and FP16 query metadata (y_sum, y_sum_squares) are read as FP32.

The new GetDistFunc<sq8, float, float16> specialization routes through IP_SQ8_FP16_GetDistFunc, Cosine_SQ8_FP16_GetDistFunc, and L2_SQ8_FP16_GetDistFunc. These dispatchers currently always return the scalar implementation; SIMD chooser slots are reserved for P1b (MOD-15152) and P1c (MOD-15153).

Which issues this PR fixes

  1. MOD-15141

Main objects this PR modified

  1. src/VecSim/spaces/IP/IP.{h,cpp} — adds SQ8_FP16_InnerProduct_Impl, SQ8_FP16_InnerProduct, and SQ8_FP16_Cosine.
  2. src/VecSim/spaces/L2/L2.{h,cpp} — adds SQ8_FP16_L2Sqr using the algebraic identity over the precomputed sums-of-squares.
  3. src/VecSim/spaces/IP_space.{h,cpp} — adds IP_SQ8_FP16_GetDistFunc and Cosine_SQ8_FP16_GetDistFunc dispatchers.
  4. src/VecSim/spaces/L2_space.{h,cpp} — adds L2_SQ8_FP16_GetDistFunc dispatcher.
  5. src/VecSim/spaces/spaces.cpp — adds the GetDistFunc<vecsim_types::sq8, float, vecsim_types::float16> specialization wiring L2/IP/Cosine to the new dispatchers.
  6. tests/utils/tests_utils.h — adds populate_sq8_fp16_query and SQ8_FP16_NotOptimized_{InnerProduct,Cosine,L2Sqr} baselines (the L2 baseline mirrors the kernel's algebraic identity to avoid quantization-error drift at high dim).
  7. tests/unit/test_spaces.cpp — adds SQ8_FP16_NoOptimizationSpacesTest parameterized over 19 dimensions (including odd and SIMD-boundary residues: 1, 5, 7, 8, 9, 15, 16, 17, 31, 32, 33, 47, 48, 49, 63, 64, 65, 127, 128) for L2/IP/Cosine, plus SQ8_FP16_EdgeCases (zero query, constant storage, mixed-sign), SpacesTest.GetDistFuncSQ8FP16Asymmetric, and SpacesTest.GetDistFuncInvalidMetricSQ8ToFP16.
  8. tests/benchmark/spaces_benchmarks/bm_spaces_sq8_fp16.cpp (new) + tests/benchmark/CMakeLists.txt — registers naive scalar benchmarks for L2/IP/Cosine; SIMD benchmark blocks will be added by P1b/P1c when the chooser symbols become available.

Test results

  • All 1096 test_spaces tests pass locally.
  • bm_spaces_sq8_fp16 builds and runs successfully.

Follow-ups (separate stories, out of scope here)

  • MOD-15152 (P1b) — x86 SIMD kernels (AVX2 / AVX512 / AVX512_VNNI / SSE4) and benchmark wiring.
  • MOD-15153 (P1c) — ARM SIMD kernels (NEON / SVE / SVE2) and benchmark wiring.
  • MOD-15141 P1a-2 — QuantPreprocessor FP16 input support (no plumbing into the index types yet).

Mark if applicable

  • This PR introduces API changes
  • This PR introduces serialization changes

Note

Medium Risk
Touches core distance computation and dispatch for a new datatype combination; primary risk is subtle correctness/UB issues around mixed-type layouts and unaligned metadata reads, mitigated by added load_unaligned and broad test coverage.

Overview
Adds asymmetric SQ8 storage + FP16 query support for IP, Cosine, and L2 by introducing new scalar kernels (SQ8_FP16_InnerProduct_Impl/SQ8_FP16_InnerProduct/SQ8_FP16_Cosine and SQ8_FP16_L2Sqr) that widen FP16 elements to FP32 and use precomputed SQ8/query metadata.

Wires the new mode into the public dispatch path via GetDistFunc<sq8, float, float16> and new *_SQ8_FP16_GetDistFunc helpers (currently always returning scalar implementations), and adds load_unaligned to safely read trailing FP32 metadata for odd dimensions.

Extends validation and performance tooling with a new bm_spaces_sq8_fp16 benchmark target plus unit tests (including odd-dimension unaligned-metadata and edge cases) and new SQ8-FP16 baseline/query-population utilities in tests_utils.h.

Reviewed by Cursor Bugbot for commit 304450d. Bugbot is set up for automated code reviews on this repo. Configure here.

@dor-forer dor-forer changed the title MOD-15141 Add SQ8-FP16 scalar kernels, dispatchers, tests and benchmarks Add SQ8-FP16 scalar kernels, dispatchers, tests and benchmarks [MOD-15141] Apr 29, 2026
@dor-forer dor-forer requested a review from Copilot April 29, 2026 11:26
@jit-ci
Copy link
Copy Markdown

jit-ci Bot commented Apr 29, 2026

🛡️ Jit Security Scan Results

CRITICAL HIGH MEDIUM

✅ No security findings were detected in this PR


Security scan by Jit

@dor-forer dor-forer marked this pull request as ready for review April 29, 2026 11:28
@jit-ci
Copy link
Copy Markdown

jit-ci Bot commented Apr 29, 2026

❌ Jit Scanner failed - Our team is investigating

Jit Scanner failed - Our team has been notified and is working to resolve the issue. Please contact support if you have any questions.


💡 Need to bypass this check? Comment @sera bypass to override.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds asymmetric SQ8-storage → FP16-query distance support (scalar path) for L2, Inner Product, and Cosine in VecSim, including dispatch plumbing plus unit tests and benchmarks.

Changes:

  • Implement scalar SQ8↔FP16 kernels for IP/Cosine and L2² using stored/query FP32 metadata (sum / sum-of-squares) and algebraic identities.
  • Add SQ8→FP16 GetDistFunc specialization and metric-specific dispatchers (currently scalar-only; SIMD chooser slots reserved).
  • Add test utilities, unit tests (including edge cases + parameterized dimensions), and a new benchmark target for SQ8→FP16 spaces.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/VecSim/spaces/IP/IP.cpp Adds SQ8→FP16 inner-product implementation used by IP/Cosine/L2.
src/VecSim/spaces/IP/IP.h Declares SQ8→FP16 IP/Cosine APIs and shared IP implementation.
src/VecSim/spaces/L2/L2.cpp Adds SQ8→FP16 L2² implementation using precomputed norms + IP.
src/VecSim/spaces/L2/L2.h Declares SQ8→FP16 L2² API.
src/VecSim/spaces/IP_space.cpp Adds SQ8→FP16 dispatcher stubs returning scalar implementations.
src/VecSim/spaces/IP_space.h Declares SQ8→FP16 IP/Cosine dispatcher APIs.
src/VecSim/spaces/L2_space.cpp Adds SQ8→FP16 L2 dispatcher stub returning scalar implementation.
src/VecSim/spaces/L2_space.h Declares SQ8→FP16 L2 dispatcher API.
src/VecSim/spaces/spaces.cpp Wires GetDistFunc<sq8, float, float16> to the new dispatchers.
tests/utils/tests_utils.h Adds SQ8→FP16 query population/preprocess helpers and baselines.
tests/unit/test_spaces.cpp Adds SQ8→FP16 correctness tests (basic, getters, param dims, edge cases).
tests/benchmark/spaces_benchmarks/bm_spaces_sq8_fp16.cpp Introduces SQ8→FP16 scalar benchmarks for L2/IP/Cosine.
tests/benchmark/CMakeLists.txt Registers the new bm_spaces_sq8_fp16 benchmark target.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/VecSim/spaces/IP/IP.cpp Outdated
Comment thread src/VecSim/spaces/L2/L2.cpp Outdated
Comment thread tests/utils/tests_utils.h Outdated
Comment thread tests/unit/test_spaces.cpp
Comment thread tests/benchmark/spaces_benchmarks/bm_spaces_sq8_fp16.cpp Outdated
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 68703d4. Configure here.

Comment thread tests/benchmark/spaces_benchmarks/bm_spaces_sq8_fp16.cpp Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 29, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.93%. Comparing base (5bcc53e) to head (304450d).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #942      +/-   ##
==========================================
+ Coverage   96.71%   96.93%   +0.22%     
==========================================
  Files         129      130       +1     
  Lines        8057     7712     -345     
==========================================
- Hits         7792     7476     -316     
+ Misses        265      236      -29     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/VecSim/spaces/IP/IP.cpp Outdated
Comment thread src/VecSim/spaces/L2/L2.cpp Outdated
Comment thread tests/utils/tests_utils.h Outdated
…ccess

- Add load_unaligned<T> helper to VecSim/utils/alignment.h.
- Refactor SQ8_FP16_InnerProduct_Impl and SQ8_FP16_L2Sqr to use it for
  the misaligned FP32 metadata reads, replacing inline std::memcpy.
- Apply the same helper in tests/utils/tests_utils.h reference helpers
  (SQ8_FP16_NotOptimized_InnerProduct, SQ8_FP16_NotOptimized_L2Sqr),
  fixing the same alignment UB pattern that remained in the test path.
- Add SQ8_FP16_l2sqr_odd_dim_unaligned_metadata_test that exercises the
  L2 kernel with deterministically misaligned storage and query
  metadata addresses (asserts the metadata bytes are not 4-byte aligned
  before invoking the kernel).
@dor-forer dor-forer requested a review from lerman25 April 29, 2026 14:19
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/utils/tests_utils.h
Comment thread tests/unit/test_spaces.cpp Outdated
Address review comments on SQ8_FP16 test alignment:
- Allocate FP16 query buffers in test_spaces.cpp / bm_spaces_sq8_fp16.cpp
  as std::vector<float16> / new float16[] (with extra slots for FP32
  metadata) so the production SQ8_FP16_* kernels' typed float16* loads
  never see a misaligned pointer.
- Harden the test-only helpers (populate/preprocess_sq8_fp16_query and
  SQ8_FP16_NotOptimized_*) to access float16 values via memcpy on
  uint16_t, so they remain safe under any caller alignment.
Comment on lines +19 to +26
template <typename T>
static inline T load_unaligned(const void *ptr) {
static_assert(std::is_trivially_copyable_v<T>,
"load_unaligned requires a trivially-copyable T");
T value;
std::memcpy(&value, ptr, sizeof(T));
return value;
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same alignment concern as #discussion_r3161710275, but on the existing SQ8_FP32 / SQ8_SQ8 kernels — they read FP32 metadata via a direct reinterpret_cast<const float*> and have the same misalignment for odd dim. Can we either apply load_unaligned there too in this PR, or open a follow-up so we don't end up with two conventions for the same problem in adjacent functions?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right.
This is not in the scope of this task - I opened a ticket for this:
https://redislabs.atlassian.net/browse/MOD-15303

@dor-forer dor-forer requested a review from lerman25 May 3, 2026 07:10
@dor-forer dor-forer enabled auto-merge May 3, 2026 07:32
Copy link
Copy Markdown
Collaborator

@lerman25 lerman25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job!

@dor-forer dor-forer added this pull request to the merge queue May 3, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 3, 2026
@dor-forer dor-forer added this pull request to the merge queue May 3, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 3, 2026
@dor-forer dor-forer added this pull request to the merge queue May 3, 2026
@dor-forer dor-forer removed this pull request from the merge queue due to a manual request May 3, 2026
@dor-forer dor-forer added this pull request to the merge queue May 3, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 3, 2026
@dor-forer dor-forer added this pull request to the merge queue May 4, 2026
Merged via the queue into main with commit 8c5791f May 4, 2026
17 checks passed
@dor-forer dor-forer deleted the feat/MOD-15141-sq8-fp16-scalar-distance branch May 4, 2026 07:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants