Skip to content

SNMG Batched KMeans Python API#2154

Open
viclafargue wants to merge 4 commits into
rapidsai:mainfrom
viclafargue:snmg-ooc-kmeans-python-api
Open

SNMG Batched KMeans Python API#2154
viclafargue wants to merge 4 commits into
rapidsai:mainfrom
viclafargue:snmg-ooc-kmeans-python-api

Conversation

@viclafargue

@viclafargue viclafargue commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Closes #2149 and #2155

@coderabbitai

coderabbitai Bot commented Jun 2, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Added multi‑GPU K‑Means: new C API entry point (v2) and Python bindings exposing fit with host‑array inputs, optional sample weights, and centroid outputs.
  • Documentation

    • Added C and Python API reference pages and updated navigation to include Multi‑GPU K‑Means.
  • Tests

    • Added C and Python test suites exercising multi‑GPU K‑Means behavior, options, and input validation.

Walkthrough

Adds single-node multi-GPU K-Means: a C API and implementation accepting DLPack tensors, Python Cython bindings that validate/convert host NumPy inputs and call the C API, GPU-gated tests, build wiring, and Fern API docs/navigation.

Changes

Multi-GPU K-Means APIs

Layer / File(s) Summary
C API surface and library wiring
c/include/cuvs/cluster/mg_kmeans.h, c/include/cuvs/core/all.h, c/CMakeLists.txt
New public C header declares cuvsMultiGpuKMeansFit; header is conditionally included in aggregate headers and the implementation source is compiled into cuvs_c only when BUILD_MG_ALGOS is enabled.
C implementation, dispatch, and tests
c/src/cluster/mg_kmeans.cpp, c/tests/CMakeLists.txt, c/tests/cluster/kmeans_mg_c.cu
Converts public params to native params, validates DLManagedTensors (host, C-contiguous, dtype/shape), selects float/double dispatch, runs SNMG fit (device setup, centroid allocation/initialization, fit, copy back), exports cuvsMultiGpuKMeansFit, and adds conditional C test exercising host execution.
Python package scaffolding & shared types
python/cuvs/cuvs/cluster/CMakeLists.txt, python/cuvs/cuvs/cluster/__init__.py, python/cuvs/cuvs/cluster/kmeans/kmeans.pxd, python/cuvs/cuvs/cluster/kmeans/kmeans.pyx
Adds mg subpackage to the cluster package and build, moves/declares native cuvsKMeansParams_v2 and KMeansParams native field into .pxd for sharing across modules.
Python mg.kmeans Cython bindings and wrapper
python/cuvs/cuvs/cluster/mg/kmeans/kmeans.pxd, python/cuvs/cuvs/cluster/mg/kmeans/kmeans.pyx, python/cuvs/cuvs/cluster/mg/kmeans/__init__.py, python/cuvs/cuvs/cluster/mg/*
Adds Cython extern for cuvsMultiGpuKMeansFit, implements _as_host_array validation helper, FitOutput namedtuple, and fit wrapper that converts NumPy inputs to DLPack, extracts native resources, calls the C API, and returns (centroids, inertia, n_iter).
Python tests and validation
python/cuvs/cuvs/tests/test_mg_kmeans.py
GPU-gated pytest module with helpers for synthetic data, host label/inertia reference computations, parametrized fit tests across dtypes/init methods/weights, and comprehensive input-validation error tests.
C and Python API documentation
fern/pages/c_api/c-api-cluster-mg-kmeans.md, fern/pages/python_api/python-api-cluster-mg-kmeans.md, fern/docs.yml, fern/pages/c_api/index.md, fern/pages/python_api/index.md
Fern reference pages document the C function signature and DLPack tensor requirements and the Python fit API; navigation indexes updated with new links.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • rapidsai/cuvs#2017: Changes to multi‑GPU/batched k-means internals that are closely related to the SNMG fit execution and dispatch.

Suggested labels

improvement, non-breaking, cpp

Suggested reviewers

  • tarang-jain
  • cjnolet
  • dantegd
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.34% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'SNMG Batched KMeans Python API' accurately reflects the main objective of implementing the SNMG Batched KMeans Python API as shown across the python/ directory changes.
Description check ✅ Passed The PR description 'Closes #2149 and #2155' directly references the linked issues that define the PR's objectives, establishing clear traceability.
Linked Issues check ✅ Passed All coding requirements from issue #2149 are met: a complete Python API for SNMG KMeans is implemented across python/cuvs/cuvs/cluster/mg/kmeans/ with fit function, proper validation, C-API binding, and comprehensive test coverage.
Out of Scope Changes check ✅ Passed All changes are in scope: C API additions (header, implementation, tests), Python API implementation (Cython bindings, modules, tests), CMake build integration, and documentation for SNMG KMeans. No unrelated changes detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (5)
c/tests/cluster/kmeans_mg_c.cu (1)

112-115: 💤 Low value

Centroid comparison may be sensitive to cluster ordering.

The test compares centroids positionally, expecting {1.5, 1.5, 10.5, 10.5}. While the well-separated initial centroids {0,0} and {12,12} should converge deterministically to the two clusters, k-means implementations may still reorder clusters internally. If this test ever becomes flaky, consider sorting centroids before comparison or using an order-invariant comparison.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@c/tests/cluster/kmeans_mg_c.cu` around lines 112 - 115, The positional
comparison of centroids (centroids_data vs kExpectedCentroids) is sensitive to
cluster ordering; change the test to perform an order-invariant comparison by
grouping centroids into kNClusters vectors of length kNFeatures (using
centroids_data and kNFeatures to slice), sort those centroid vectors using a
deterministic key (e.g., lexicographic compare on feature values or by the first
feature), do the same sorting for the expected centroids (kExpectedCentroids),
and then run EXPECT_NEAR pairwise on the sorted lists to ensure the test is
robust to cluster reordering.
c/src/cluster/mg_kmeans.cpp (1)

144-144: 💤 Low value

Unqualified Array at mg_kmeans.cpp isn’t an issue, but can be made clearer

  • Array is an enumerator from the C API unscoped enum cuvsKMeansInitMethod (defined in c/include/cuvs/cluster/kmeans.h and pulled in via mg_kmeans.h), so it’s expected that if (params.init == Array) uses an unqualified name—there’s no cuvsKMeansInitMethod::Array form to qualify.
  • Optional: move convert_params(params) before the check and compare kmeans_params.init to cuvs::cluster::kmeans::params::InitMethod::Array for consistency with the C++ enum.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@c/src/cluster/mg_kmeans.cpp` at line 144, The condition uses the unscoped C
enum value Array (params.init == Array); to make it clearer/consistent, call
convert_params(params) first to build the C++ struct and then compare the C++
enum: replace the direct check of params.init with a check against
kmeans_params.init == cuvs::cluster::kmeans::params::InitMethod::Array (use
convert_params to produce kmeans_params), so the code references the C++ scoped
enum rather than the unqualified C enumerator.
python/cuvs/cuvs/cluster/mg/kmeans/__init__.py (1)

1-9: ⚡ Quick win

Consider adding a module docstring.

This module serves as the public API entry point for single-node multi-GPU k-means. A brief docstring would help users understand the package's purpose and available exports.

📝 Suggested module docstring
 # SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION.
 # SPDX-License-Identifier: Apache-2.0
 
+"""Single-node multi-GPU (SNMG) k-means clustering.
+
+This module provides k-means fitting across multiple GPUs on a single node,
+with host-memory input arrays distributed across available devices.
+"""
+
 from cuvs.cluster.kmeans import KMeansParams
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/cuvs/cuvs/cluster/mg/kmeans/__init__.py` around lines 1 - 9, Add a
concise module docstring at the top of the module to describe that this package
is the public API entry point for single-node multi-GPU k-means and list the
primary exports; update the module containing KMeansParams, FitOutput, and fit
to include a short triple-quoted string explaining purpose, intended use, and
the exported symbols (FitOutput, KMeansParams, fit) so users see what this
subpackage provides.
python/cuvs/cuvs/tests/test_mg_kmeans.py (1)

31-47: ⚡ Quick win

Consider adding docstrings to test helper functions.

The helper functions make_inputs, make_sample_weights, and predict_labels_host lack documentation explaining their purpose, parameters, and return values, which would improve test maintainability.

Also applies to: 50-57

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/cuvs/cuvs/tests/test_mg_kmeans.py` around lines 31 - 47, Add concise
docstrings to the test helper functions make_inputs, make_sample_weights, and
predict_labels_host describing their purpose, parameters (dtype, n_rows, n_cols,
n_clusters where applicable), return values (e.g., X and centroids for
make_inputs; sample weights array for make_sample_weights; predicted labels for
predict_labels_host), and any important behavior (e.g., deterministic RNG seeds
and array contiguity). Place the docstring immediately under each function
signature using a short triple-quoted string.
python/cuvs/cuvs/tests/test_kmeans.py (1)

18-28: ⚡ Quick win

Consider adding a docstring to the helper function.

The make_well_separated_kmeans_input helper generates synthetic test data but lacks documentation explaining its purpose, parameters, and return values.

📝 Suggested docstring
 def make_well_separated_kmeans_input(rng, n_rows, n_cols, n_clusters, dtype):
+    """Generate well-separated synthetic k-means input with deterministic structure.
+    
+    Creates cluster centers with large separation (scale=10.0) and adds small
+    Gaussian noise (scale=0.01) to ensure clusters remain distinct.
+    
+    Args:
+        rng: NumPy random generator
+        n_rows: Number of data points
+        n_cols: Number of features
+        n_clusters: Number of clusters
+        dtype: NumPy dtype for the output arrays
+        
+    Returns:
+        Tuple of (X, initial_centroids) as contiguous NumPy arrays
+    """
     labels = np.arange(n_rows) % n_clusters
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/cuvs/cuvs/tests/test_kmeans.py` around lines 18 - 28, Add a clear
docstring to the helper function make_well_separated_kmeans_input describing its
purpose (generate well-separated KMeans test data), parameters (rng: random
generator, n_rows: int, n_cols: int, n_clusters: int, dtype: numpy dtype), and
return values (X: contiguous ndarray of shape (n_rows, n_cols) with clustered
samples, initial_centroids: ndarray of shape (n_clusters, n_cols) containing the
initial centroids copied from X); place the docstring immediately below the def
line and mention that X is returned as a contiguous array and that
initial_centroids is a copy used for initialization.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@c/include/cuvs/cluster/mg_kmeans.h`:
- Around line 34-36: Update the Doxygen for the function in mg_kmeans.h so the
documented parameter type matches the actual signature: replace references to
cuvsMultiGpuResources_t with cuvsResources_t and add a short note that the
cuvsResources_t must represent a multi-GPU resource created by
cuvsMultiGpuResourcesCreate or cuvsMultiGpuResourcesCreateWithDeviceIds (the
implementation already validates this). Ensure the comment references the exact
symbol cuvsResources_t and mentions the creation functions
cuvsMultiGpuResourcesCreate / cuvsMultiGpuResourcesCreateWithDeviceIds to avoid
confusion.

In `@fern/pages/python_api/index.md`:
- Line 8: The sidebar shows two identical "[Kmeans]" link labels; update the
link label for the multi-GPU page to a distinct name (e.g., "Kmeans (multi-GPU)"
or "Kmeans — multi‑GPU") so the entry referencing
"/api-reference/python-api-cluster-mg-kmeans" is unambiguous; edit the link text
in fern/pages/python_api/index.md where the label "[Kmeans]" appears and leave
the URL unchanged.

In `@fern/pages/python_api/python-api-cluster-mg-kmeans.md`:
- Line 27: The `resources` parameter on the KMeans cluster API is undocumented;
update the `resources` row in python-api-cluster-mg-kmeans.md to describe its
behavior: state that `resources` is an optional cuvs.common.Resources object
controlling compute resources (CPUs, GPUs, memory) for training/inference,
document the default when omitted, list accepted fields/units (e.g., cpu_count,
gpu_count, memory_gb) and how the cluster uses them (scheduling/training
limits), and add a short usage example showing how to pass a
cuvs.common.Resources instance; reference the `resources` symbol and its type
`cuvs.common.Resources` in the description.

---

Nitpick comments:
In `@c/src/cluster/mg_kmeans.cpp`:
- Line 144: The condition uses the unscoped C enum value Array (params.init ==
Array); to make it clearer/consistent, call convert_params(params) first to
build the C++ struct and then compare the C++ enum: replace the direct check of
params.init with a check against kmeans_params.init ==
cuvs::cluster::kmeans::params::InitMethod::Array (use convert_params to produce
kmeans_params), so the code references the C++ scoped enum rather than the
unqualified C enumerator.

In `@c/tests/cluster/kmeans_mg_c.cu`:
- Around line 112-115: The positional comparison of centroids (centroids_data vs
kExpectedCentroids) is sensitive to cluster ordering; change the test to perform
an order-invariant comparison by grouping centroids into kNClusters vectors of
length kNFeatures (using centroids_data and kNFeatures to slice), sort those
centroid vectors using a deterministic key (e.g., lexicographic compare on
feature values or by the first feature), do the same sorting for the expected
centroids (kExpectedCentroids), and then run EXPECT_NEAR pairwise on the sorted
lists to ensure the test is robust to cluster reordering.

In `@python/cuvs/cuvs/cluster/mg/kmeans/__init__.py`:
- Around line 1-9: Add a concise module docstring at the top of the module to
describe that this package is the public API entry point for single-node
multi-GPU k-means and list the primary exports; update the module containing
KMeansParams, FitOutput, and fit to include a short triple-quoted string
explaining purpose, intended use, and the exported symbols (FitOutput,
KMeansParams, fit) so users see what this subpackage provides.

In `@python/cuvs/cuvs/tests/test_kmeans.py`:
- Around line 18-28: Add a clear docstring to the helper function
make_well_separated_kmeans_input describing its purpose (generate well-separated
KMeans test data), parameters (rng: random generator, n_rows: int, n_cols: int,
n_clusters: int, dtype: numpy dtype), and return values (X: contiguous ndarray
of shape (n_rows, n_cols) with clustered samples, initial_centroids: ndarray of
shape (n_clusters, n_cols) containing the initial centroids copied from X);
place the docstring immediately below the def line and mention that X is
returned as a contiguous array and that initial_centroids is a copy used for
initialization.

In `@python/cuvs/cuvs/tests/test_mg_kmeans.py`:
- Around line 31-47: Add concise docstrings to the test helper functions
make_inputs, make_sample_weights, and predict_labels_host describing their
purpose, parameters (dtype, n_rows, n_cols, n_clusters where applicable), return
values (e.g., X and centroids for make_inputs; sample weights array for
make_sample_weights; predicted labels for predict_labels_host), and any
important behavior (e.g., deterministic RNG seeds and array contiguity). Place
the docstring immediately under each function signature using a short
triple-quoted string.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 7a754ca0-8545-4e19-a93e-76804f4836ed

📥 Commits

Reviewing files that changed from the base of the PR and between 0c3d007 and 89f2ca0.

📒 Files selected for processing (23)
  • c/CMakeLists.txt
  • c/include/cuvs/cluster/mg_kmeans.h
  • c/include/cuvs/core/all.h
  • c/src/cluster/mg_kmeans.cpp
  • c/tests/CMakeLists.txt
  • c/tests/cluster/kmeans_mg_c.cu
  • fern/docs.yml
  • fern/pages/c_api/c-api-cluster-mg-kmeans.md
  • fern/pages/c_api/index.md
  • fern/pages/python_api/index.md
  • fern/pages/python_api/python-api-cluster-mg-kmeans.md
  • python/cuvs/cuvs/cluster/CMakeLists.txt
  • python/cuvs/cuvs/cluster/__init__.py
  • python/cuvs/cuvs/cluster/kmeans/kmeans.pxd
  • python/cuvs/cuvs/cluster/kmeans/kmeans.pyx
  • python/cuvs/cuvs/cluster/mg/CMakeLists.txt
  • python/cuvs/cuvs/cluster/mg/__init__.py
  • python/cuvs/cuvs/cluster/mg/kmeans/CMakeLists.txt
  • python/cuvs/cuvs/cluster/mg/kmeans/__init__.py
  • python/cuvs/cuvs/cluster/mg/kmeans/kmeans.pxd
  • python/cuvs/cuvs/cluster/mg/kmeans/kmeans.pyx
  • python/cuvs/cuvs/tests/test_kmeans.py
  • python/cuvs/cuvs/tests/test_mg_kmeans.py
💤 Files with no reviewable changes (1)
  • python/cuvs/cuvs/cluster/kmeans/kmeans.pyx

Comment on lines +34 to +36
* @param[in] res cuvsMultiGpuResources_t opaque C handle
* created by cuvsMultiGpuResourcesCreate or
* cuvsMultiGpuResourcesCreateWithDeviceIds.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Documentation refers to cuvsMultiGpuResources_t but parameter is cuvsResources_t.

The Doxygen comment at line 34 mentions cuvsMultiGpuResources_t as the expected handle type, but the actual parameter type in the function signature (line 50) is cuvsResources_t. While the implementation validates that the handle is a multi-GPU resource, the documentation could cause confusion for API consumers.

📝 Suggested documentation fix
-* `@param`[in]     res           cuvsMultiGpuResources_t opaque C handle
+* `@param`[in]     res           cuvsResources_t opaque C handle (must be a multi-GPU resource)
                               created by cuvsMultiGpuResourcesCreate or
                               cuvsMultiGpuResourcesCreateWithDeviceIds.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
* @param[in] res cuvsMultiGpuResources_t opaque C handle
* created by cuvsMultiGpuResourcesCreate or
* cuvsMultiGpuResourcesCreateWithDeviceIds.
* `@param`[in] res cuvsResources_t opaque C handle (must be a multi-GPU resource)
* created by cuvsMultiGpuResourcesCreate or
* cuvsMultiGpuResourcesCreateWithDeviceIds.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@c/include/cuvs/cluster/mg_kmeans.h` around lines 34 - 36, Update the Doxygen
for the function in mg_kmeans.h so the documented parameter type matches the
actual signature: replace references to cuvsMultiGpuResources_t with
cuvsResources_t and add a short note that the cuvsResources_t must represent a
multi-GPU resource created by cuvsMultiGpuResourcesCreate or
cuvsMultiGpuResourcesCreateWithDeviceIds (the implementation already validates
this). Ensure the comment references the exact symbol cuvsResources_t and
mentions the creation functions cuvsMultiGpuResourcesCreate /
cuvsMultiGpuResourcesCreateWithDeviceIds to avoid confusion.

## Cluster

- [Kmeans](/api-reference/python-api-cluster-kmeans)
- [Kmeans](/api-reference/python-api-cluster-mg-kmeans)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use a distinct label for the multi-GPU page.

This adds a second identical "Kmeans" label, making the two links ambiguous in the sidebar.

🧭 Proposed fix
-- [Kmeans](/api-reference/python-api-cluster-mg-kmeans)
+- [Multi-GPU Kmeans](/api-reference/python-api-cluster-mg-kmeans)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- [Kmeans](/api-reference/python-api-cluster-mg-kmeans)
- [Multi-GPU Kmeans](/api-reference/python-api-cluster-mg-kmeans)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/python_api/index.md` at line 8, The sidebar shows two identical
"[Kmeans]" link labels; update the link label for the multi-GPU page to a
distinct name (e.g., "Kmeans (multi-GPU)" or "Kmeans — multi‑GPU") so the entry
referencing "/api-reference/python-api-cluster-mg-kmeans" is unambiguous; edit
the link text in fern/pages/python_api/index.md where the label "[Kmeans]"
appears and leave the URL unchanged.

| `X` | `host array-like` | Training instances, shape (m, k). Must be C-contiguous float32 or float64 host data. |
| `centroids` | `host array-like, optional` | Initial centroids when ``params.init_method == "Array"`` and output centroids for all init methods. If omitted, a host NumPy output array is allocated unless ``init_method == "Array"``. |
| `sample_weights` | `host array-like, optional` | Optional weights per observation. Must be C-contiguous and have the same dtype as X. |
| `resources` | `cuvs.common.Resources, optional` | |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Document the resources parameter behavior.

resources is currently undocumented (empty description), which makes the API contract incomplete for users.

📝 Proposed doc fix
-| `resources` | `cuvs.common.Resources, optional` |  |
+| `resources` | `cuvs.common.Resources, optional` | Multi-GPU resources handle. If omitted, default resources are used by the wrapper. |
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
| `resources` | `cuvs.common.Resources, optional` | |
| `resources` | `cuvs.common.Resources, optional` | Multi-GPU resources handle. If omitted, default resources are used by the wrapper. |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/python_api/python-api-cluster-mg-kmeans.md` at line 27, The
`resources` parameter on the KMeans cluster API is undocumented; update the
`resources` row in python-api-cluster-mg-kmeans.md to describe its behavior:
state that `resources` is an optional cuvs.common.Resources object controlling
compute resources (CPUs, GPUs, memory) for training/inference, document the
default when omitted, list accepted fields/units (e.g., cpu_count, gpu_count,
memory_gb) and how the cluster uses them (scheduling/training limits), and add a
short usage example showing how to pass a cuvs.common.Resources instance;
reference the `resources` symbol and its type `cuvs.common.Resources` in the
description.

Comment thread c/include/cuvs/cluster/mg_kmeans.h Outdated
* closest cluster center.
* @param[out] n_iter Number of iterations run.
*/
CUVS_EXPORT cuvsError_t cuvsMultiGpuKMeansFit_v2(cuvsResources_t res,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we dont need this suffix. Can add breaking changes in this release.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
c/src/cluster/mg_kmeans.cpp (1)

129-147: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Ignoring DLTensor.byte_offset breaks sliced DLPack inputs and outputs.

These host/device views are built from tensor.data directly, but DLPack view tensors can carry a non-zero byte_offset. That means X, sample_weight, and centroids can all be read from the wrong address, and the final centroid copy-back can overwrite the wrong host region.

Possible fix
+template <typename T>
+T* dlpack_ptr(const DLTensor& tensor)
+{
+  return reinterpret_cast<T*>(static_cast<char*>(tensor.data) + tensor.byte_offset);
+}
+
   auto X_view = raft::make_host_matrix_view<T const, IdxT>(
-    reinterpret_cast<T const*>(X.data), n_samples, n_features);
+    dlpack_ptr<T const>(X), n_samples, n_features);
 
   std::optional<raft::host_vector_view<T const, IdxT>> sample_weight;
   if (sample_weight_tensor != nullptr) {
     auto sw = sample_weight_tensor->dl_tensor;
-    sample_weight =
-      raft::make_host_vector_view<T const, IdxT>(reinterpret_cast<T const*>(sw.data), n_samples);
+    sample_weight = raft::make_host_vector_view<T const, IdxT>(
+      dlpack_ptr<T const>(sw), n_samples);
   }
 
   if (params.init == Array) {
     raft::update_device(d_centroids.data_handle(),
-                        reinterpret_cast<T const*>(centroids.data),
+                        dlpack_ptr<T const>(centroids),
                         n_centroid_values,
                         stream);
     raft::resource::sync_stream(rank0_res, stream);
   }
 
   raft::update_host(
-    reinterpret_cast<T*>(centroids.data), d_centroids.data_handle(), n_centroid_values, stream);
+    dlpack_ptr<T>(centroids), d_centroids.data_handle(), n_centroid_values, stream);

Also applies to: 163-164

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@c/src/cluster/mg_kmeans.cpp` around lines 129 - 147, The host/device views
currently use tensor.data directly and ignore DLTensor.byte_offset, which breaks
sliced DLPack tensors; fix by computing the true element pointer for X,
sample_weight, and centroids using the dl_tensor.byte_offset (i.e., add
byte_offset bytes to the raw data pointer before reinterpret_cast to T const*)
when creating X_view (raft::make_host_matrix_view), sample_weight
(raft::make_host_vector_view), and when passing centroids into
raft::update_device and the final device-to-host copy (the block around
params.init == Array and the later copy-back at lines ~163-164); ensure you use
the dl_tensor->data as a char* (or uint8_t*) plus byte_offset then cast to T
const* so all views/copies reference the correct sliced memory.
c/tests/cluster/kmeans_mg_c.cu (1)

66-71: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

This test never exercises the multi-GPU path.

cuvsMultiGpuResourcesCreateWithDeviceIds is fed a single device id, so the new SNMG API can pass here without covering any cross-device distribution or synchronization behavior. For a feature whose contract is single-node multi-GPU KMeans, that is a real coverage gap.

Possible fix
   cuvsResources_t res;
-  int32_t device_ids[1] = {0};
+  int device_count = 0;
+  ASSERT_EQ(cudaGetDeviceCount(&device_count), cudaSuccess);
+  if (device_count < 2) { GTEST_SKIP() << "requires at least 2 GPUs"; }
+
+  int32_t device_ids[2] = {0, 1};
   DLManagedTensor device_ids_t{};
-  cuvs::core::to_dlpack(raft::make_host_vector_view<int32_t, int64_t>(device_ids, 1),
+  cuvs::core::to_dlpack(raft::make_host_vector_view<int32_t, int64_t>(device_ids, 2),
                         &device_ids_t);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@c/tests/cluster/kmeans_mg_c.cu` around lines 66 - 71, The test only passes a
single device id so it never triggers multi-GPU behavior; update the setup for
cuvsMultiGpuResourcesCreateWithDeviceIds to provide at least two device IDs
(e.g., int32_t device_ids[2] = {0,1}), create the corresponding DLManagedTensor
(device_ids_t) for the correct length, and call
cuvsMultiGpuResourcesCreateWithDeviceIds(&res, &device_ids_t) to exercise
cross-device logic; additionally, add a runtime guard that queries available GPU
count and skips the test if fewer than two devices are present so the test
remains robust on single-GPU CI.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@c/src/cluster/mg_kmeans.cpp`:
- Around line 129-147: The host/device views currently use tensor.data directly
and ignore DLTensor.byte_offset, which breaks sliced DLPack tensors; fix by
computing the true element pointer for X, sample_weight, and centroids using the
dl_tensor.byte_offset (i.e., add byte_offset bytes to the raw data pointer
before reinterpret_cast to T const*) when creating X_view
(raft::make_host_matrix_view), sample_weight (raft::make_host_vector_view), and
when passing centroids into raft::update_device and the final device-to-host
copy (the block around params.init == Array and the later copy-back at lines
~163-164); ensure you use the dl_tensor->data as a char* (or uint8_t*) plus
byte_offset then cast to T const* so all views/copies reference the correct
sliced memory.

In `@c/tests/cluster/kmeans_mg_c.cu`:
- Around line 66-71: The test only passes a single device id so it never
triggers multi-GPU behavior; update the setup for
cuvsMultiGpuResourcesCreateWithDeviceIds to provide at least two device IDs
(e.g., int32_t device_ids[2] = {0,1}), create the corresponding DLManagedTensor
(device_ids_t) for the correct length, and call
cuvsMultiGpuResourcesCreateWithDeviceIds(&res, &device_ids_t) to exercise
cross-device logic; additionally, add a runtime guard that queries available GPU
count and skips the test if fewer than two devices are present so the test
remains robust on single-GPU CI.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 67b5b51e-db2b-40be-ab6d-0d8cd34ebd88

📥 Commits

Reviewing files that changed from the base of the PR and between b2eb0d7 and 684b6d1.

📒 Files selected for processing (7)
  • c/include/cuvs/cluster/mg_kmeans.h
  • c/src/cluster/mg_kmeans.cpp
  • c/tests/cluster/kmeans_mg_c.cu
  • fern/pages/c_api/c-api-cluster-mg-kmeans.md
  • python/cuvs/cuvs/cluster/kmeans/kmeans.pxd
  • python/cuvs/cuvs/cluster/mg/kmeans/kmeans.pxd
  • python/cuvs/cuvs/cluster/mg/kmeans/kmeans.pyx
✅ Files skipped from review due to trivial changes (1)
  • fern/pages/c_api/c-api-cluster-mg-kmeans.md
🚧 Files skipped from review as they are similar to previous changes (2)
  • python/cuvs/cuvs/cluster/mg/kmeans/kmeans.pxd
  • python/cuvs/cuvs/cluster/mg/kmeans/kmeans.pyx

@viclafargue viclafargue requested a review from tarang-jain June 15, 2026 09:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

SNMG Batched KMeans Python API and benchmarking

3 participants