feat(rust): add refine bindings#2230
Conversation
Add a safe Rust wrapper for the cuvsRefine C API in the cuvs crate. Refine is a free function that re-ranks an approximate ANN candidate list exactly against the original dataset and returns the true top-k. - New cuvs::refine::refine free function mirroring the pairwise_distance wrapper shape: Resources + ManagedTensor inputs/outputs + DistanceType, returning Result<()>. No new index struct. - The cuvsRefine binding already exists in cuvs-sys (core/all.h), so no bindgen regeneration was needed; this is Rust-side only. - Doc comment documents the C contract (uniform host/device memory, int64 candidates/indices, f32 distances, k inferred from output shape) with a runnable no_run example. - Unit test feeds deliberately wrong/mis-ordered candidate lists with planted noise indices and asserts the refined top-k equals the exact brute-force top-k, verifying real re-ranking rather than non-crashing. Built and tested against conda libcuvs 26.06 with the DLPack CMake package; cargo fmt clean, no new clippy findings.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds a safe Rust wrapper for the cuVS C API ChangescuvsRefine Safe Rust Bindings
🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
rust/cuvs/src/refine.rs (1)
210-214: ⚡ Quick winAssert full top-
kdistance ordering in the test.Line 210-214 only checks
dist[0] <= dist[1]; withk = 3,dist[1] <= dist[2]is not verified, so a partially unsorted output can still pass.Suggested patch
- assert!(distances_host[[0, 0]] <= distances_host[[0, 1]]); - assert!(distances_host[[1, 0]] <= distances_host[[1, 1]]); + for q in 0..n_queries { + for j in 0..(k - 1) { + assert!( + distances_host[[q, j]] <= distances_host[[q, j + 1]], + "q{} distances not sorted ascending: {:?}", + q, + distances_host.row(q) + ); + } + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@rust/cuvs/src/refine.rs` around lines 210 - 214, The test currently only asserts pairwise ordering for the first two distances, allowing a partially-unsorted top-k to pass; update the assertions on distances_host to verify the full top-k ordering for k = 3 by adding checks that distances_host[[0,1]] <= distances_host[[0,2]] and distances_host[[1,1]] <= distances_host[[1,2]] (i.e., ensure distances_host[[i,0]] <= distances_host[[i,1]] <= distances_host[[i,2]] for the relevant rows), so the refined distances are fully sorted ascending for each row.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@rust/cuvs/src/refine.rs`:
- Around line 210-214: The test currently only asserts pairwise ordering for the
first two distances, allowing a partially-unsorted top-k to pass; update the
assertions on distances_host to verify the full top-k ordering for k = 3 by
adding checks that distances_host[[0,1]] <= distances_host[[0,2]] and
distances_host[[1,1]] <= distances_host[[1,2]] (i.e., ensure
distances_host[[i,0]] <= distances_host[[i,1]] <= distances_host[[i,2]] for the
relevant rows), so the refined distances are fully sorted ascending for each
row.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 75fcbb43-552b-4c6d-9c3e-79d54a5c2d74
📒 Files selected for processing (3)
UPSTREAM_PR_BODY.mdrust/cuvs/src/lib.rsrust/cuvs/src/refine.rs
Add Rust bindings for the
refineAPIWhat
This PR adds safe Rust bindings for the
cuvsRefineC API in thecuvscrate.Refinement is a free function (not an index type) that follows an approximate
nearest-neighbors search: given a per-query candidate list produced by an ANN
method, it recomputes exact distances against the original dataset and selects
the true top-
k. This lets callers trade a cheap approximate first pass for anexact re-rank over a small candidate set.
The new
cuvs::refine::refinefree function mirrors the shape of the existingcuvs::distance::pairwise_distancewrapper — it takesResources, input/outputManagedTensors, and aDistanceType, and returnsResult<()>. No new indexstruct is introduced.
Files changed
rust/cuvs/src/refine.rs(new) —refine()wrapper, doc comment with arunnable (
no_run) example, and a behavioral unit test.rust/cuvs/src/lib.rs—pub mod refine;.Reviewer notes
cuvsRefineis already present in the generatedrust/cuvs-sys/src/bindings.rs(it lives incore/all.h, adjacent to theivf_flatblock), so nocuvs-sysregeneration was required. This PR isRust-side only.
c/src/neighbors/refine.cpp: all tensors must live in thesame memory space (all device or all host — the C layer rejects mixing).
candidatesand outputindicesmust beint64; outputdistancesmust befloat32;queries/datasetdtype codes must match.kis taken from theoutput tensor shape (
[n_queries, k]), andn_candidates >= k. The wrapperforwards tensors as-is and surfaces these constraints in the doc comment;
validation is left to the C layer (consistent with the other wrappers).
refine.rsat the crate root, alongsidedistance/) matchespairwise_distance. Open to relocating under aneighbors-style module if the crate later groups neighbor ops.Testing summary
cargo build -p cuvs— clean.cargo test -p cuvs refine -- --test-threads=1— the unit testtest_refine_fixes_wrong_candidatespasses. It builds a small, well-separated2-D dataset, hands
refinedeliberately wrong / mis-ordered candidatelists (each containing a planted far-away noise index), and asserts that the
refined top-
kexactly equals the brute-force exact top-k: the planted noisecandidates are evicted, the true nearest neighbor is restored to rank 0, the
refined index sets match the exact sets, and distances come back sorted
ascending. This verifies real re-ranking behavior, not merely that the call
succeeds.
cargo test -p cuvs --doc refine— the doc example compiles.cargo fmt -p cuvs -- --check— clean.cargo clippy -p cuvs— no findings on the new code. (There is a pre-existingnot_unsafe_ptr_arg_dereflint onresources.rs::set_cuda_streamfrom a newerclippy; it is untouched by this PR.)
libcuvs26.06 with the DLPack CMake package onCMAKE_PREFIX_PATH, on a single CUDA device.Sibling-PR conflict note
This work was developed alongside a separate IVF-SQ bindings PR. Both touch
rust/cuvs/src/lib.rs(each adds onepub modline). The additions areindependent and order-agnostic; whichever lands second will need a trivial
one-line merge in
lib.rs. No other files overlap.