Skip to content

feat(index): support raw-query ivf rq search#7078

Open
BubbleCal wants to merge 13 commits into
mainfrom
yang/ivfrq-pr3-split-code-query
Open

feat(index): support raw-query ivf rq search#7078
BubbleCal wants to merge 13 commits into
mainfrom
yang/ivfrq-pr3-split-code-query

Conversation

@BubbleCal

@BubbleCal BubbleCal commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Feature

  • Adds explicit IVF_RQ query_estimator metadata so released indexes without the field continue to read as residual_query, while newly written indexes use raw_query.
  • Implements raw-query IVF_RQ search for new num_bits == 1 indexes and multi-bit split-code indexes, including ex-code factors and runtime-only rotated centroid caches derived from the original IvfModel centroids.
  • Prepares the rotated raw query and split-code lookup tables once per query worker and reuses them across probed partitions; each partition updates only the cluster correction.
  • Relaxes the public IVF_RQ num_bits > 1 gate for supported metrics, including cosine via Lance's normalized-L2 handling.

Compatibility

  • Old IVF_RQ indexes that lack query_estimator metadata still default to the legacy residual-query estimator.
  • Original IVF centroids remain the source of truth for partition assignment, incremental indexing, and persisted metadata.

Performance Improvement

The benchmark below was run with search-benchmark on GCP VM yang-agent-00bd-ivfrq-rerun-20260605, dataset gist, k=10, max_threads=1, target_partition_size=4096, no refine. Latencies are converted from CSV seconds to milliseconds.

Provenance:

  • search-benchmark commit: 61ef8f7b97589032a83eeae1e52664be9f035551
  • main Lance baseline commit: 437849118f380d92c1ea849f99996e9072be58df
  • PR branch commit benchmarked: ce548a49766670b80275daae6f1bf97c70e885e4

Additional DBpedia comparison on the same VM, current branch only, dataset dbpedia, k=10, max_threads=1, target_partition_size=4096. For IVF_PQ, sub_vector_dim=8; one extra row includes refine_factor=2 at nprobes=24.

Index Config nprobes refine recall@10 avg ms p99 ms QPS indexing s
IVF_RQ num_bits=1 8 - 0.7917 1.59 1.98 615.8 16.45
IVF_RQ num_bits=1 16 - 0.8102 2.35 2.98 420.3 16.45
IVF_RQ num_bits=1 24 - 0.8162 3.19 3.93 311.4 16.45
IVF_RQ num_bits=3 8 - 0.9014 2.14 2.63 463.8 27.01
IVF_RQ num_bits=3 16 - 0.9263 2.93 3.58 338.9 27.01
IVF_RQ num_bits=3 24 - 0.9352 3.82 4.74 261.0 27.01
IVF_RQ num_bits=5 8 - 0.9207 2.32 2.80 426.2 33.93
IVF_RQ num_bits=5 16 - 0.9520 3.32 4.05 300.1 33.93
IVF_RQ num_bits=5 24 - 0.9624 4.56 5.57 218.1 33.93
IVF_RQ num_bits=7 8 - 0.9278 2.84 3.39 350.3 46.76
IVF_RQ num_bits=7 16 - 0.9572 3.77 4.45 264.1 46.76
IVF_RQ num_bits=7 24 - 0.9683 4.96 5.94 200.7 46.76
IVF_PQ sub_vector_dim=8 8 - 0.7354 4.44 5.50 223.7 153.84
IVF_PQ sub_vector_dim=8 16 - 0.7447 8.05 9.68 123.6 153.84
IVF_PQ sub_vector_dim=8 24 - 0.7483 12.80 14.72 78.0 153.84
IVF_PQ sub_vector_dim=8 24 2 0.9133 12.84 14.96 77.7 153.84

Tests

  • cargo fmt --all
  • cargo test -p lance-index raw_query
  • cargo test -p lance-index try_from_batch_
  • cargo test -p lance-index rabit_quantizer
  • cargo test -p lance test_rabitq_distance_types
  • cargo test -p lance test_build_ivf_rq
  • cargo clippy -p lance-index -p lance --tests --benches -- -D warnings
  • uv run make build
  • targeted Python IVF_RQ multi-bit and cosine search test
  • targeted Python distributed IVF_RQ shared-rotation test

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

@github-actions github-actions Bot added A-python Python bindings A-index Vector index, linalg, tokenizer A-format On-disk format: protos and format spec docs labels Jun 3, 2026
@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Important

This PR touches the Lance format specification.

Substantive changes to the format specification — the .proto definitions
and the spec docs under docs/src/format/ — require a PMC vote before merge.
Minor edits such as typo fixes, wording, or formatting are excluded; use your
judgment.

If this is a meaningful format change:

  • Start a vote following the Lance community voting process.
    Format specification modifications need 3 binding +1 votes (excluding the
    proposer), held on GitHub Discussions, with a minimum voting period of 1 week.
  • Once the vote passes, link the completed vote in this PR. It should not be
    merged until the vote is linked.

@github-actions github-actions Bot added the enhancement New feature or request label Jun 3, 2026
Base automatically changed from yang/ivfrq-num-bits-pr2-bench to main June 5, 2026 06:09
@BubbleCal BubbleCal force-pushed the yang/ivfrq-pr3-split-code-query branch from ce548a4 to 763ecc0 Compare June 8, 2026 05:37
@BubbleCal

Copy link
Copy Markdown
Contributor Author

@claude review

@BubbleCal

Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 42b122a38b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread rust/lance-index/src/vector/bq/builder.rs

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a substantial new feature touching the Lance format spec, the IVF_RQ storage layout, and the core distance-calculator path — I'm leaving it for a human reviewer.

Extended reasoning...

Overview

This PR introduces a new raw_query estimator for IVF_RQ search alongside the existing residual_query estimator, including multi-bit (num_bits > 1) split-code search. Changes span 13 files: a format-spec doc update, new metadata fields (query_estimator), substantial additions to bq/storage.rs (~1100 lines including new ex-code distance tables, FastScan paths, and lower-bound gating), a refactor of the flat index search loops to use new accumulate_topk_with_scratch trait methods, a fix to PERM0_INVERSE in the SIMD dist table (the prior value was incorrect — note the new test_perm0_inverse_matches_perm0 test), and IVF v2 plumbing for per-query rotated-query contexts and per-index rotated-centroid caches.

Security risks

No direct security risks — this is internal indexing code with no auth, crypto, or untrusted-input boundaries.

Level of scrutiny

High. This PR touches the format specification (per the format-change-vote reminder, this needs a PMC vote with 3 binding +1s and a 1-week voting period), modifies the storage layout for newly-built IVF_RQ indexes, and rewrites the search hot path with new lower-bound gating that affects recall/accuracy. The compatibility shim (default_query_estimator_compatResidualQuery) is the right approach for old indexes, but the correctness of the new raw-query estimator factors, error-factor gating, and FastScan ex-code path all warrant careful human review.

Other factors

  • The PR has good test coverage: new unit tests for the estimator factors, FastScan ex-code paths, lower-bound gating, cache slice borrowing, and a parameterized integration test for multi-bit L2/Cosine search.
  • A previously-incorrect PERM0_INVERSE constant is silently fixed here — worth a human verifying no callers depended on the buggy value.
  • The Cosine→L2 distance-type rewrite in try_from_batch for raw-query indexes is subtle and could surprise readers; worth confirming the rewrite is correct and intentional.
  • The format-spec change requires a PMC vote that has not yet been linked on this PR.

Comment thread rust/lance-linalg/src/simd/dist_table.rs
Comment thread python/python/tests/test_vector_index.py Outdated
Comment thread docs/src/format/index/vector/index.md
Comment thread rust/lance/src/index/vector/ivf/v2.rs
Comment on lines +277 to +282
def skip_write_after_current_write(self, version: str) -> bool:
# Newly written IVF_RQ indexes carry raw-query estimator metadata and
# split-code schema that older runtimes can query but cannot optimize.
# The upgrade_downgrade variant still covers old 1-bit residual-query
# indexes being read and rewritten by the current runtime.
return True

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The new skip_write_after_current_write override (test_vector_indices.py:277-282) leaves skip_read_after_current_write at the default False, so the downgrade flow still runs the old runtime's check_read against the newly-written num_bits=1 IVF_RQ index — but builder.rs::new_with_rotation now unconditionally writes query_estimator: RawQuery (lines 240, 251), a field old serde silently drops, so the old runtime applies the legacy ResidualQuery distance formula to __add_factors/__scale_factors that were computed for the RawQuery formula. check_read only asserts result.num_rows == 4 and never validates rankings, so the test passes vacuously while real downgraded users see silent recall regression and the PR-author comment claiming "older runtimes can query the new indexes (just cannot optimize)" is wrong for the num_bits=1 path. Fix: either also override skip_read_after_current_write to True (and remove the misleading comment), or keep ResidualQuery as the default query_estimator for num_bits=1 so the on-disk meaning of __add_factors/__scale_factors stays compatible with released runtimes (the num_bits>1 path was already rejected by old runtimes via the removed validate_supported_rq_num_bits gate).

Extended reasoning...

What the bug is

The PR adds skip_write_after_current_write = True to IvfRqVectorIndex (test_vector_indices.py:277-282) with a comment claiming "older runtimes can query the new indexes (just cannot optimize)." That premise is false for num_bits=1, and the downgrade direction is therefore exercising a silently-broken read path that the compat test cannot catch.

In compat_decorator.py:340-343, the downgrade flow is now gated as:

if not obj.skip_read_after_current_write(version):
    venv.execute_method(obj, "check_read", obj.compat_env(version, "check_read"))
if not obj.skip_write_after_current_write(version):
    venv.execute_method(obj, "check_write", obj.compat_env(version, "check_write"))

IvfRqVectorIndex overrides only skip_write_after_current_write (returns True). skip_read_after_current_write inherits the default False from UpgradeDowngradeTest (compat_decorator.py:152-154), so the old venv still runs check_read against the index written by the current runtime.

The on-disk meaning of factor columns changed

builder.rs::RabitQuantizer::new_with_rotation unconditionally sets query_estimator: RabitQueryEstimator::RawQuery in both the Matrix branch (line 240) and the Fast branch (line 251) — there is no num_bits=1 carve-out. So every newly-built IVF_RQ index, including num_bits=1, ships with the raw-query estimator and the new RawQuery factor formula:

// transform.rs (raw-query, num_bits=1, L2)
scale = -2 * |res|^2 / binary_res_dot               // binary_res_dot = 0.5 * sum(|rotated|)
add   = |res|^2 + 2 * binary_correction             // binary_correction = |res|^2 * binary_cent_dot / binary_res_dot

The released runtime's transform writes:

// transform.rs (residual-query, L2)
scale = -2 * |res|^2 / ip_rq_res                    // ip_rq_res = sum(|rotated|) / sqrt(d)
add   = |res|^2

binary_res_dot and ip_rq_res are related by binary_res_dot = (sqrt(d) / 2) * ip_rq_res, so the new scale magnitude is (2 / sqrt(d)) times the legacy magnitude. add gains an extra 2 * binary_correction term that depends on per-row residual signs vs the rotated centroid — a row-dependent perturbation that breaks rank ordering, not just a constant offset.

The old runtime parses the new index as if it were a residual-query index

The pre-PR RabitQuantizationMetadata struct has no query_estimator field, and there is no #[serde(deny_unknown_fields)] (verified in storage.rs:220 and earlier release commits). Serde silently ignores the new "query_estimator":"raw_query" key. The old runtime then plugs the on-disk factor values — which now mean RawQuery — into the legacy ResidualQuery formula in storage.rs::distance():

let dist_vq_qr = (2.0 * dist - sum_q) / sqrt_d;
dist_vq_qr * scale + add + query_factor

while the new code applies (dist - 0.5 * sum_q) * scale + add + query_factor to the same column values. The old code also residualizes the query before building dist_table (via QueryResidual::Centroid), while the raw-query factors are computed assuming a raw (un-residualized) query. The new extra column __error_factors and the new metadata field are both silently ignored by the old reader (column_by_name tolerates extras and old metadata struct has no field for them), so the index loads cleanly and just returns wrong distances.

Why check_read does not catch it

IvfRqVectorIndex.check_read (test_vector_indices.py:311-340) asserts:

assert result.num_rows == 4
# plus num_indexed_rows > 0 and (under current runtime) num_bits == 1

No ground-truth k-NN comparison, no recall threshold, no row-id verification. The IVF search still returns 4 rows when distances are completely wrong — they are just the wrong 4 rows. CLAUDE.md:111 mandates a >=0.5 recall threshold for vector index tests precisely to catch this class of silent regression, and the compat test was the place where it should have triggered.

Step-by-step proof

  1. Build a num_bits=1 IVF_RQ index on the current branch. The metadata JSON includes "query_estimator":"raw_query" and the aux file gains an __error_factors column.
  2. transform.rs::transform takes the new branch (query_estimator != ResidualQuery) and writes __add_factors/__scale_factors via compute_raw_query_factors — values that the new storage.rs::distance() RawQuery branch consumes as (binary_dot - 0.5 * sum_q) * scale + add + query_factor.
  3. The downgrade venv (pre-PR) opens the same aux file. Its RabitQuantizationMetadata deserialize ignores query_estimator (no deny_unknown_fields) and ignores __error_factors (only named columns are pulled).
  4. Its distance() applies the legacy formula (2 * dist - sum_q) / sqrt_d * scale + add + query_factor to columns whose magnitudes are off by 2/sqrt(d) in scale and disturbed by the row-dependent 2 * binary_correction in add. For dim=32 (the compat dataset), 2/sqrt(32) ≈ 0.354, so the scale magnitudes are roughly a third of what the old formula expects.
  5. check_read issues a 4-NN search and asserts result.num_rows == 4. The IVF/FLAT pipeline returns four rows from the probed partition regardless of distance values, so the assertion holds and the test passes — but the rows returned are not the four nearest under either metric.

How to fix

Either option from the original report works and is small:

  • (a) Override skip_read_after_current_write to True in IvfRqVectorIndex, and replace the misleading comment with an explicit note that downgrade-direction reads are not supported once a num_bits=1 index is rewritten by current. This honestly marks the compat regression rather than papering over it.
  • (b) Keep ResidualQuery as the default query_estimator when num_bits == 1 in builder.rs::new_with_rotation (only switch to RawQuery for num_bits > 1, which old runtimes already rejected via validate_supported_rq_num_bits before this PR removed that gate). The binary-only on-disk layout then stays bit-for-bit compatible with released runtimes, and the PR's stated compat claim becomes true.

Optionally combine with strengthening check_read to assert recall against an exhaustive top-k so this class of silent regression cannot regress in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-format On-disk format: protos and format spec docs A-index Vector index, linalg, tokenizer A-python Python bindings enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant