Skip to content

feat(rust): add refine bindings#2230

Open
jamie8johnson wants to merge 2 commits into
rapidsai:mainfrom
jamie8johnson:rust-refine
Open

feat(rust): add refine bindings#2230
jamie8johnson wants to merge 2 commits into
rapidsai:mainfrom
jamie8johnson:rust-refine

Conversation

@jamie8johnson

Copy link
Copy Markdown
Contributor

Add Rust bindings for the refine API

What

This PR adds safe Rust bindings for the cuvsRefine C API in the cuvs crate.

Refinement is a free function (not an index type) that follows an approximate
nearest-neighbors search: given a per-query candidate list produced by an ANN
method, it recomputes exact distances against the original dataset and selects
the true top-k. This lets callers trade a cheap approximate first pass for an
exact re-rank over a small candidate set.

The new cuvs::refine::refine free function mirrors the shape of the existing
cuvs::distance::pairwise_distance wrapper — it takes Resources, input/output
ManagedTensors, and a DistanceType, and returns Result<()>. No new index
struct is introduced.

pub fn refine(
    res: &Resources,
    dataset: &ManagedTensor,
    queries: &ManagedTensor,
    candidates: &ManagedTensor,
    metric: DistanceType,
    indices: &ManagedTensor,
    distances: &ManagedTensor,
) -> Result<()>

Files changed

  • rust/cuvs/src/refine.rs (new) — refine() wrapper, doc comment with a
    runnable (no_run) example, and a behavioral unit test.
  • rust/cuvs/src/lib.rspub mod refine;.

Reviewer notes

  • Bindings already existed. cuvsRefine is already present in the generated
    rust/cuvs-sys/src/bindings.rs (it lives in core/all.h, adjacent to the
    ivf_flat block), so no cuvs-sys regeneration was required. This PR is
    Rust-side only.
  • Contract from c/src/neighbors/refine.cpp: all tensors must live in the
    same memory space (all device or all host — the C layer rejects mixing).
    candidates and output indices must be int64; output distances must be
    float32; queries/dataset dtype codes must match. k is taken from the
    output tensor shape ([n_queries, k]), and n_candidates >= k. The wrapper
    forwards tensors as-is and surfaces these constraints in the doc comment;
    validation is left to the C layer (consistent with the other wrappers).
  • The free-function placement (refine.rs at the crate root, alongside
    distance/) matches pairwise_distance. Open to relocating under a
    neighbors-style module if the crate later groups neighbor ops.

Testing summary

  • cargo build -p cuvs — clean.
  • cargo test -p cuvs refine -- --test-threads=1 — the unit test
    test_refine_fixes_wrong_candidates passes. It builds a small, well-separated
    2-D dataset, hands refine deliberately wrong / mis-ordered candidate
    lists (each containing a planted far-away noise index), and asserts that the
    refined top-k exactly equals the brute-force exact top-k: the planted noise
    candidates are evicted, the true nearest neighbor is restored to rank 0, the
    refined index sets match the exact sets, and distances come back sorted
    ascending. This verifies real re-ranking behavior, not merely that the call
    succeeds.
  • cargo test -p cuvs --doc refine — the doc example compiles.
  • cargo fmt -p cuvs -- --check — clean.
  • cargo clippy -p cuvs — no findings on the new code. (There is a pre-existing
    not_unsafe_ptr_arg_deref lint on resources.rs::set_cuda_stream from a newer
    clippy; it is untouched by this PR.)
  • Built and tested against conda libcuvs 26.06 with the DLPack CMake package on
    CMAKE_PREFIX_PATH, on a single CUDA device.

Sibling-PR conflict note

This work was developed alongside a separate IVF-SQ bindings PR. Both touch
rust/cuvs/src/lib.rs (each adds one pub mod line). The additions are
independent and order-agnostic; whichever lands second will need a trivial
one-line merge in lib.rs. No other files overlap.

Add a safe Rust wrapper for the cuvsRefine C API in the cuvs crate.
Refine is a free function that re-ranks an approximate ANN candidate
list exactly against the original dataset and returns the true top-k.

- New cuvs::refine::refine free function mirroring the pairwise_distance
  wrapper shape: Resources + ManagedTensor inputs/outputs + DistanceType,
  returning Result<()>. No new index struct.
- The cuvsRefine binding already exists in cuvs-sys (core/all.h), so no
  bindgen regeneration was needed; this is Rust-side only.
- Doc comment documents the C contract (uniform host/device memory,
  int64 candidates/indices, f32 distances, k inferred from output shape)
  with a runnable no_run example.
- Unit test feeds deliberately wrong/mis-ordered candidate lists with
  planted noise indices and asserts the refined top-k equals the exact
  brute-force top-k, verifying real re-ranking rather than non-crashing.

Built and tested against conda libcuvs 26.06 with the DLPack CMake
package; cargo fmt clean, no new clippy findings.
@copy-pr-bot

copy-pr-bot Bot commented Jun 10, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 18553350-cd98-46b7-b7bc-d11661500225

📥 Commits

Reviewing files that changed from the base of the PR and between 6a8155f and c517744.

📒 Files selected for processing (1)
  • rust/cuvs/src/refine.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • rust/cuvs/src/refine.rs

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Added a refinement API to improve candidate search results with multiple distance metrics and automatic k inference.
  • Tests

    • Added an integration-style unit test validating refined top-k correctness and distance ordering against brute-force results.
  • Documentation

    • Added comprehensive API docs and usage example describing input/output tensor requirements and runtime contracts.

Walkthrough

Adds a safe Rust wrapper for the cuVS C API cuvsRefine: documents the API and C-side contracts, exports a new refine module, implements cuvs::refine::refine that calls ffi::cuvsRefine, and includes an integration test validating top-k correction and distance ordering.

Changes

cuvsRefine Safe Rust Bindings

Layer / File(s) Summary
API Design and Contract Documentation
UPSTREAM_PR_BODY.md
Documents the cuvs::refine::refine signature, argument and tensor dtype/memory contracts, k and n_candidates semantics, and the test expectations for correcting candidates and sorted distances.
Public Module Export
rust/cuvs/src/lib.rs
Exports the new refine module via pub mod refine; at the crate root.
Safe Wrapper Implementation and Integration Testing
rust/cuvs/src/refine.rs
Implements pub fn refine(...) -> Result<()> as an unsafe thin wrapper around ffi::cuvsRefine, forwarding ManagedTensor pointers and checking errors; adds an integration-style unit test that verifies refinement fixes wrong candidates and returns sorted top-k distances.

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(rust): add refine bindings' directly and concisely summarizes the main change: adding Rust bindings for the refine API.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, explaining the purpose, API shape, files changed, testing, and implementation details of the refine bindings.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
rust/cuvs/src/refine.rs (1)

210-214: ⚡ Quick win

Assert full top-k distance ordering in the test.

Line 210-214 only checks dist[0] <= dist[1]; with k = 3, dist[1] <= dist[2] is not verified, so a partially unsorted output can still pass.

Suggested patch
-        assert!(distances_host[[0, 0]] <= distances_host[[0, 1]]);
-        assert!(distances_host[[1, 0]] <= distances_host[[1, 1]]);
+        for q in 0..n_queries {
+            for j in 0..(k - 1) {
+                assert!(
+                    distances_host[[q, j]] <= distances_host[[q, j + 1]],
+                    "q{} distances not sorted ascending: {:?}",
+                    q,
+                    distances_host.row(q)
+                );
+            }
+        }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rust/cuvs/src/refine.rs` around lines 210 - 214, The test currently only
asserts pairwise ordering for the first two distances, allowing a
partially-unsorted top-k to pass; update the assertions on distances_host to
verify the full top-k ordering for k = 3 by adding checks that
distances_host[[0,1]] <= distances_host[[0,2]] and distances_host[[1,1]] <=
distances_host[[1,2]] (i.e., ensure distances_host[[i,0]] <=
distances_host[[i,1]] <= distances_host[[i,2]] for the relevant rows), so the
refined distances are fully sorted ascending for each row.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@rust/cuvs/src/refine.rs`:
- Around line 210-214: The test currently only asserts pairwise ordering for the
first two distances, allowing a partially-unsorted top-k to pass; update the
assertions on distances_host to verify the full top-k ordering for k = 3 by
adding checks that distances_host[[0,1]] <= distances_host[[0,2]] and
distances_host[[1,1]] <= distances_host[[1,2]] (i.e., ensure
distances_host[[i,0]] <= distances_host[[i,1]] <= distances_host[[i,2]] for the
relevant rows), so the refined distances are fully sorted ascending for each
row.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 75fcbb43-552b-4c6d-9c3e-79d54a5c2d74

📥 Commits

Reviewing files that changed from the base of the PR and between 78135be and 6a8155f.

📒 Files selected for processing (3)
  • UPSTREAM_PR_BODY.md
  • rust/cuvs/src/lib.rs
  • rust/cuvs/src/refine.rs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant