SNMG Batched KMeans Python API#2154
Conversation
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds single-node multi-GPU K-Means: a C API and implementation accepting DLPack tensors, Python Cython bindings that validate/convert host NumPy inputs and call the C API, GPU-gated tests, build wiring, and Fern API docs/navigation. ChangesMulti-GPU K-Means APIs
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (5)
c/tests/cluster/kmeans_mg_c.cu (1)
112-115: 💤 Low valueCentroid comparison may be sensitive to cluster ordering.
The test compares centroids positionally, expecting
{1.5, 1.5, 10.5, 10.5}. While the well-separated initial centroids{0,0}and{12,12}should converge deterministically to the two clusters, k-means implementations may still reorder clusters internally. If this test ever becomes flaky, consider sorting centroids before comparison or using an order-invariant comparison.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@c/tests/cluster/kmeans_mg_c.cu` around lines 112 - 115, The positional comparison of centroids (centroids_data vs kExpectedCentroids) is sensitive to cluster ordering; change the test to perform an order-invariant comparison by grouping centroids into kNClusters vectors of length kNFeatures (using centroids_data and kNFeatures to slice), sort those centroid vectors using a deterministic key (e.g., lexicographic compare on feature values or by the first feature), do the same sorting for the expected centroids (kExpectedCentroids), and then run EXPECT_NEAR pairwise on the sorted lists to ensure the test is robust to cluster reordering.c/src/cluster/mg_kmeans.cpp (1)
144-144: 💤 Low valueUnqualified
Arrayatmg_kmeans.cppisn’t an issue, but can be made clearer
Arrayis an enumerator from the C API unscoped enumcuvsKMeansInitMethod(defined inc/include/cuvs/cluster/kmeans.hand pulled in viamg_kmeans.h), so it’s expected thatif (params.init == Array)uses an unqualified name—there’s nocuvsKMeansInitMethod::Arrayform to qualify.- Optional: move
convert_params(params)before the check and comparekmeans_params.inittocuvs::cluster::kmeans::params::InitMethod::Arrayfor consistency with the C++ enum.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@c/src/cluster/mg_kmeans.cpp` at line 144, The condition uses the unscoped C enum value Array (params.init == Array); to make it clearer/consistent, call convert_params(params) first to build the C++ struct and then compare the C++ enum: replace the direct check of params.init with a check against kmeans_params.init == cuvs::cluster::kmeans::params::InitMethod::Array (use convert_params to produce kmeans_params), so the code references the C++ scoped enum rather than the unqualified C enumerator.python/cuvs/cuvs/cluster/mg/kmeans/__init__.py (1)
1-9: ⚡ Quick winConsider adding a module docstring.
This module serves as the public API entry point for single-node multi-GPU k-means. A brief docstring would help users understand the package's purpose and available exports.
📝 Suggested module docstring
# SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION. # SPDX-License-Identifier: Apache-2.0 +"""Single-node multi-GPU (SNMG) k-means clustering. + +This module provides k-means fitting across multiple GPUs on a single node, +with host-memory input arrays distributed across available devices. +""" + from cuvs.cluster.kmeans import KMeansParams🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@python/cuvs/cuvs/cluster/mg/kmeans/__init__.py` around lines 1 - 9, Add a concise module docstring at the top of the module to describe that this package is the public API entry point for single-node multi-GPU k-means and list the primary exports; update the module containing KMeansParams, FitOutput, and fit to include a short triple-quoted string explaining purpose, intended use, and the exported symbols (FitOutput, KMeansParams, fit) so users see what this subpackage provides.python/cuvs/cuvs/tests/test_mg_kmeans.py (1)
31-47: ⚡ Quick winConsider adding docstrings to test helper functions.
The helper functions
make_inputs,make_sample_weights, andpredict_labels_hostlack documentation explaining their purpose, parameters, and return values, which would improve test maintainability.Also applies to: 50-57
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@python/cuvs/cuvs/tests/test_mg_kmeans.py` around lines 31 - 47, Add concise docstrings to the test helper functions make_inputs, make_sample_weights, and predict_labels_host describing their purpose, parameters (dtype, n_rows, n_cols, n_clusters where applicable), return values (e.g., X and centroids for make_inputs; sample weights array for make_sample_weights; predicted labels for predict_labels_host), and any important behavior (e.g., deterministic RNG seeds and array contiguity). Place the docstring immediately under each function signature using a short triple-quoted string.python/cuvs/cuvs/tests/test_kmeans.py (1)
18-28: ⚡ Quick winConsider adding a docstring to the helper function.
The
make_well_separated_kmeans_inputhelper generates synthetic test data but lacks documentation explaining its purpose, parameters, and return values.📝 Suggested docstring
def make_well_separated_kmeans_input(rng, n_rows, n_cols, n_clusters, dtype): + """Generate well-separated synthetic k-means input with deterministic structure. + + Creates cluster centers with large separation (scale=10.0) and adds small + Gaussian noise (scale=0.01) to ensure clusters remain distinct. + + Args: + rng: NumPy random generator + n_rows: Number of data points + n_cols: Number of features + n_clusters: Number of clusters + dtype: NumPy dtype for the output arrays + + Returns: + Tuple of (X, initial_centroids) as contiguous NumPy arrays + """ labels = np.arange(n_rows) % n_clusters🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@python/cuvs/cuvs/tests/test_kmeans.py` around lines 18 - 28, Add a clear docstring to the helper function make_well_separated_kmeans_input describing its purpose (generate well-separated KMeans test data), parameters (rng: random generator, n_rows: int, n_cols: int, n_clusters: int, dtype: numpy dtype), and return values (X: contiguous ndarray of shape (n_rows, n_cols) with clustered samples, initial_centroids: ndarray of shape (n_clusters, n_cols) containing the initial centroids copied from X); place the docstring immediately below the def line and mention that X is returned as a contiguous array and that initial_centroids is a copy used for initialization.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@c/include/cuvs/cluster/mg_kmeans.h`:
- Around line 34-36: Update the Doxygen for the function in mg_kmeans.h so the
documented parameter type matches the actual signature: replace references to
cuvsMultiGpuResources_t with cuvsResources_t and add a short note that the
cuvsResources_t must represent a multi-GPU resource created by
cuvsMultiGpuResourcesCreate or cuvsMultiGpuResourcesCreateWithDeviceIds (the
implementation already validates this). Ensure the comment references the exact
symbol cuvsResources_t and mentions the creation functions
cuvsMultiGpuResourcesCreate / cuvsMultiGpuResourcesCreateWithDeviceIds to avoid
confusion.
In `@fern/pages/python_api/index.md`:
- Line 8: The sidebar shows two identical "[Kmeans]" link labels; update the
link label for the multi-GPU page to a distinct name (e.g., "Kmeans (multi-GPU)"
or "Kmeans — multi‑GPU") so the entry referencing
"/api-reference/python-api-cluster-mg-kmeans" is unambiguous; edit the link text
in fern/pages/python_api/index.md where the label "[Kmeans]" appears and leave
the URL unchanged.
In `@fern/pages/python_api/python-api-cluster-mg-kmeans.md`:
- Line 27: The `resources` parameter on the KMeans cluster API is undocumented;
update the `resources` row in python-api-cluster-mg-kmeans.md to describe its
behavior: state that `resources` is an optional cuvs.common.Resources object
controlling compute resources (CPUs, GPUs, memory) for training/inference,
document the default when omitted, list accepted fields/units (e.g., cpu_count,
gpu_count, memory_gb) and how the cluster uses them (scheduling/training
limits), and add a short usage example showing how to pass a
cuvs.common.Resources instance; reference the `resources` symbol and its type
`cuvs.common.Resources` in the description.
---
Nitpick comments:
In `@c/src/cluster/mg_kmeans.cpp`:
- Line 144: The condition uses the unscoped C enum value Array (params.init ==
Array); to make it clearer/consistent, call convert_params(params) first to
build the C++ struct and then compare the C++ enum: replace the direct check of
params.init with a check against kmeans_params.init ==
cuvs::cluster::kmeans::params::InitMethod::Array (use convert_params to produce
kmeans_params), so the code references the C++ scoped enum rather than the
unqualified C enumerator.
In `@c/tests/cluster/kmeans_mg_c.cu`:
- Around line 112-115: The positional comparison of centroids (centroids_data vs
kExpectedCentroids) is sensitive to cluster ordering; change the test to perform
an order-invariant comparison by grouping centroids into kNClusters vectors of
length kNFeatures (using centroids_data and kNFeatures to slice), sort those
centroid vectors using a deterministic key (e.g., lexicographic compare on
feature values or by the first feature), do the same sorting for the expected
centroids (kExpectedCentroids), and then run EXPECT_NEAR pairwise on the sorted
lists to ensure the test is robust to cluster reordering.
In `@python/cuvs/cuvs/cluster/mg/kmeans/__init__.py`:
- Around line 1-9: Add a concise module docstring at the top of the module to
describe that this package is the public API entry point for single-node
multi-GPU k-means and list the primary exports; update the module containing
KMeansParams, FitOutput, and fit to include a short triple-quoted string
explaining purpose, intended use, and the exported symbols (FitOutput,
KMeansParams, fit) so users see what this subpackage provides.
In `@python/cuvs/cuvs/tests/test_kmeans.py`:
- Around line 18-28: Add a clear docstring to the helper function
make_well_separated_kmeans_input describing its purpose (generate well-separated
KMeans test data), parameters (rng: random generator, n_rows: int, n_cols: int,
n_clusters: int, dtype: numpy dtype), and return values (X: contiguous ndarray
of shape (n_rows, n_cols) with clustered samples, initial_centroids: ndarray of
shape (n_clusters, n_cols) containing the initial centroids copied from X);
place the docstring immediately below the def line and mention that X is
returned as a contiguous array and that initial_centroids is a copy used for
initialization.
In `@python/cuvs/cuvs/tests/test_mg_kmeans.py`:
- Around line 31-47: Add concise docstrings to the test helper functions
make_inputs, make_sample_weights, and predict_labels_host describing their
purpose, parameters (dtype, n_rows, n_cols, n_clusters where applicable), return
values (e.g., X and centroids for make_inputs; sample weights array for
make_sample_weights; predicted labels for predict_labels_host), and any
important behavior (e.g., deterministic RNG seeds and array contiguity). Place
the docstring immediately under each function signature using a short
triple-quoted string.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 7a754ca0-8545-4e19-a93e-76804f4836ed
📒 Files selected for processing (23)
c/CMakeLists.txtc/include/cuvs/cluster/mg_kmeans.hc/include/cuvs/core/all.hc/src/cluster/mg_kmeans.cppc/tests/CMakeLists.txtc/tests/cluster/kmeans_mg_c.cufern/docs.ymlfern/pages/c_api/c-api-cluster-mg-kmeans.mdfern/pages/c_api/index.mdfern/pages/python_api/index.mdfern/pages/python_api/python-api-cluster-mg-kmeans.mdpython/cuvs/cuvs/cluster/CMakeLists.txtpython/cuvs/cuvs/cluster/__init__.pypython/cuvs/cuvs/cluster/kmeans/kmeans.pxdpython/cuvs/cuvs/cluster/kmeans/kmeans.pyxpython/cuvs/cuvs/cluster/mg/CMakeLists.txtpython/cuvs/cuvs/cluster/mg/__init__.pypython/cuvs/cuvs/cluster/mg/kmeans/CMakeLists.txtpython/cuvs/cuvs/cluster/mg/kmeans/__init__.pypython/cuvs/cuvs/cluster/mg/kmeans/kmeans.pxdpython/cuvs/cuvs/cluster/mg/kmeans/kmeans.pyxpython/cuvs/cuvs/tests/test_kmeans.pypython/cuvs/cuvs/tests/test_mg_kmeans.py
💤 Files with no reviewable changes (1)
- python/cuvs/cuvs/cluster/kmeans/kmeans.pyx
| * @param[in] res cuvsMultiGpuResources_t opaque C handle | ||
| * created by cuvsMultiGpuResourcesCreate or | ||
| * cuvsMultiGpuResourcesCreateWithDeviceIds. |
There was a problem hiding this comment.
Documentation refers to cuvsMultiGpuResources_t but parameter is cuvsResources_t.
The Doxygen comment at line 34 mentions cuvsMultiGpuResources_t as the expected handle type, but the actual parameter type in the function signature (line 50) is cuvsResources_t. While the implementation validates that the handle is a multi-GPU resource, the documentation could cause confusion for API consumers.
📝 Suggested documentation fix
-* `@param`[in] res cuvsMultiGpuResources_t opaque C handle
+* `@param`[in] res cuvsResources_t opaque C handle (must be a multi-GPU resource)
created by cuvsMultiGpuResourcesCreate or
cuvsMultiGpuResourcesCreateWithDeviceIds.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| * @param[in] res cuvsMultiGpuResources_t opaque C handle | |
| * created by cuvsMultiGpuResourcesCreate or | |
| * cuvsMultiGpuResourcesCreateWithDeviceIds. | |
| * `@param`[in] res cuvsResources_t opaque C handle (must be a multi-GPU resource) | |
| * created by cuvsMultiGpuResourcesCreate or | |
| * cuvsMultiGpuResourcesCreateWithDeviceIds. |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@c/include/cuvs/cluster/mg_kmeans.h` around lines 34 - 36, Update the Doxygen
for the function in mg_kmeans.h so the documented parameter type matches the
actual signature: replace references to cuvsMultiGpuResources_t with
cuvsResources_t and add a short note that the cuvsResources_t must represent a
multi-GPU resource created by cuvsMultiGpuResourcesCreate or
cuvsMultiGpuResourcesCreateWithDeviceIds (the implementation already validates
this). Ensure the comment references the exact symbol cuvsResources_t and
mentions the creation functions cuvsMultiGpuResourcesCreate /
cuvsMultiGpuResourcesCreateWithDeviceIds to avoid confusion.
| ## Cluster | ||
|
|
||
| - [Kmeans](/api-reference/python-api-cluster-kmeans) | ||
| - [Kmeans](/api-reference/python-api-cluster-mg-kmeans) |
There was a problem hiding this comment.
Use a distinct label for the multi-GPU page.
This adds a second identical "Kmeans" label, making the two links ambiguous in the sidebar.
🧭 Proposed fix
-- [Kmeans](/api-reference/python-api-cluster-mg-kmeans)
+- [Multi-GPU Kmeans](/api-reference/python-api-cluster-mg-kmeans)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - [Kmeans](/api-reference/python-api-cluster-mg-kmeans) | |
| - [Multi-GPU Kmeans](/api-reference/python-api-cluster-mg-kmeans) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@fern/pages/python_api/index.md` at line 8, The sidebar shows two identical
"[Kmeans]" link labels; update the link label for the multi-GPU page to a
distinct name (e.g., "Kmeans (multi-GPU)" or "Kmeans — multi‑GPU") so the entry
referencing "/api-reference/python-api-cluster-mg-kmeans" is unambiguous; edit
the link text in fern/pages/python_api/index.md where the label "[Kmeans]"
appears and leave the URL unchanged.
| | `X` | `host array-like` | Training instances, shape (m, k). Must be C-contiguous float32 or float64 host data. | | ||
| | `centroids` | `host array-like, optional` | Initial centroids when ``params.init_method == "Array"`` and output centroids for all init methods. If omitted, a host NumPy output array is allocated unless ``init_method == "Array"``. | | ||
| | `sample_weights` | `host array-like, optional` | Optional weights per observation. Must be C-contiguous and have the same dtype as X. | | ||
| | `resources` | `cuvs.common.Resources, optional` | | |
There was a problem hiding this comment.
Document the resources parameter behavior.
resources is currently undocumented (empty description), which makes the API contract incomplete for users.
📝 Proposed doc fix
-| `resources` | `cuvs.common.Resources, optional` | |
+| `resources` | `cuvs.common.Resources, optional` | Multi-GPU resources handle. If omitted, default resources are used by the wrapper. |📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| | `resources` | `cuvs.common.Resources, optional` | | | |
| | `resources` | `cuvs.common.Resources, optional` | Multi-GPU resources handle. If omitted, default resources are used by the wrapper. | |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@fern/pages/python_api/python-api-cluster-mg-kmeans.md` at line 27, The
`resources` parameter on the KMeans cluster API is undocumented; update the
`resources` row in python-api-cluster-mg-kmeans.md to describe its behavior:
state that `resources` is an optional cuvs.common.Resources object controlling
compute resources (CPUs, GPUs, memory) for training/inference, document the
default when omitted, list accepted fields/units (e.g., cpu_count, gpu_count,
memory_gb) and how the cluster uses them (scheduling/training limits), and add a
short usage example showing how to pass a cuvs.common.Resources instance;
reference the `resources` symbol and its type `cuvs.common.Resources` in the
description.
| * closest cluster center. | ||
| * @param[out] n_iter Number of iterations run. | ||
| */ | ||
| CUVS_EXPORT cuvsError_t cuvsMultiGpuKMeansFit_v2(cuvsResources_t res, |
There was a problem hiding this comment.
we dont need this suffix. Can add breaking changes in this release.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
c/src/cluster/mg_kmeans.cpp (1)
129-147:⚠️ Potential issue | 🟠 Major | ⚡ Quick winIgnoring
DLTensor.byte_offsetbreaks sliced DLPack inputs and outputs.These host/device views are built from
tensor.datadirectly, but DLPack view tensors can carry a non-zerobyte_offset. That meansX,sample_weight, andcentroidscan all be read from the wrong address, and the final centroid copy-back can overwrite the wrong host region.Possible fix
+template <typename T> +T* dlpack_ptr(const DLTensor& tensor) +{ + return reinterpret_cast<T*>(static_cast<char*>(tensor.data) + tensor.byte_offset); +} + auto X_view = raft::make_host_matrix_view<T const, IdxT>( - reinterpret_cast<T const*>(X.data), n_samples, n_features); + dlpack_ptr<T const>(X), n_samples, n_features); std::optional<raft::host_vector_view<T const, IdxT>> sample_weight; if (sample_weight_tensor != nullptr) { auto sw = sample_weight_tensor->dl_tensor; - sample_weight = - raft::make_host_vector_view<T const, IdxT>(reinterpret_cast<T const*>(sw.data), n_samples); + sample_weight = raft::make_host_vector_view<T const, IdxT>( + dlpack_ptr<T const>(sw), n_samples); } if (params.init == Array) { raft::update_device(d_centroids.data_handle(), - reinterpret_cast<T const*>(centroids.data), + dlpack_ptr<T const>(centroids), n_centroid_values, stream); raft::resource::sync_stream(rank0_res, stream); } raft::update_host( - reinterpret_cast<T*>(centroids.data), d_centroids.data_handle(), n_centroid_values, stream); + dlpack_ptr<T>(centroids), d_centroids.data_handle(), n_centroid_values, stream);Also applies to: 163-164
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@c/src/cluster/mg_kmeans.cpp` around lines 129 - 147, The host/device views currently use tensor.data directly and ignore DLTensor.byte_offset, which breaks sliced DLPack tensors; fix by computing the true element pointer for X, sample_weight, and centroids using the dl_tensor.byte_offset (i.e., add byte_offset bytes to the raw data pointer before reinterpret_cast to T const*) when creating X_view (raft::make_host_matrix_view), sample_weight (raft::make_host_vector_view), and when passing centroids into raft::update_device and the final device-to-host copy (the block around params.init == Array and the later copy-back at lines ~163-164); ensure you use the dl_tensor->data as a char* (or uint8_t*) plus byte_offset then cast to T const* so all views/copies reference the correct sliced memory.c/tests/cluster/kmeans_mg_c.cu (1)
66-71:⚠️ Potential issue | 🟠 Major | ⚡ Quick winThis test never exercises the multi-GPU path.
cuvsMultiGpuResourcesCreateWithDeviceIdsis fed a single device id, so the new SNMG API can pass here without covering any cross-device distribution or synchronization behavior. For a feature whose contract is single-node multi-GPU KMeans, that is a real coverage gap.Possible fix
cuvsResources_t res; - int32_t device_ids[1] = {0}; + int device_count = 0; + ASSERT_EQ(cudaGetDeviceCount(&device_count), cudaSuccess); + if (device_count < 2) { GTEST_SKIP() << "requires at least 2 GPUs"; } + + int32_t device_ids[2] = {0, 1}; DLManagedTensor device_ids_t{}; - cuvs::core::to_dlpack(raft::make_host_vector_view<int32_t, int64_t>(device_ids, 1), + cuvs::core::to_dlpack(raft::make_host_vector_view<int32_t, int64_t>(device_ids, 2), &device_ids_t);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@c/tests/cluster/kmeans_mg_c.cu` around lines 66 - 71, The test only passes a single device id so it never triggers multi-GPU behavior; update the setup for cuvsMultiGpuResourcesCreateWithDeviceIds to provide at least two device IDs (e.g., int32_t device_ids[2] = {0,1}), create the corresponding DLManagedTensor (device_ids_t) for the correct length, and call cuvsMultiGpuResourcesCreateWithDeviceIds(&res, &device_ids_t) to exercise cross-device logic; additionally, add a runtime guard that queries available GPU count and skips the test if fewer than two devices are present so the test remains robust on single-GPU CI.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@c/src/cluster/mg_kmeans.cpp`:
- Around line 129-147: The host/device views currently use tensor.data directly
and ignore DLTensor.byte_offset, which breaks sliced DLPack tensors; fix by
computing the true element pointer for X, sample_weight, and centroids using the
dl_tensor.byte_offset (i.e., add byte_offset bytes to the raw data pointer
before reinterpret_cast to T const*) when creating X_view
(raft::make_host_matrix_view), sample_weight (raft::make_host_vector_view), and
when passing centroids into raft::update_device and the final device-to-host
copy (the block around params.init == Array and the later copy-back at lines
~163-164); ensure you use the dl_tensor->data as a char* (or uint8_t*) plus
byte_offset then cast to T const* so all views/copies reference the correct
sliced memory.
In `@c/tests/cluster/kmeans_mg_c.cu`:
- Around line 66-71: The test only passes a single device id so it never
triggers multi-GPU behavior; update the setup for
cuvsMultiGpuResourcesCreateWithDeviceIds to provide at least two device IDs
(e.g., int32_t device_ids[2] = {0,1}), create the corresponding DLManagedTensor
(device_ids_t) for the correct length, and call
cuvsMultiGpuResourcesCreateWithDeviceIds(&res, &device_ids_t) to exercise
cross-device logic; additionally, add a runtime guard that queries available GPU
count and skips the test if fewer than two devices are present so the test
remains robust on single-GPU CI.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 67b5b51e-db2b-40be-ab6d-0d8cd34ebd88
📒 Files selected for processing (7)
c/include/cuvs/cluster/mg_kmeans.hc/src/cluster/mg_kmeans.cppc/tests/cluster/kmeans_mg_c.cufern/pages/c_api/c-api-cluster-mg-kmeans.mdpython/cuvs/cuvs/cluster/kmeans/kmeans.pxdpython/cuvs/cuvs/cluster/mg/kmeans/kmeans.pxdpython/cuvs/cuvs/cluster/mg/kmeans/kmeans.pyx
✅ Files skipped from review due to trivial changes (1)
- fern/pages/c_api/c-api-cluster-mg-kmeans.md
🚧 Files skipped from review as they are similar to previous changes (2)
- python/cuvs/cuvs/cluster/mg/kmeans/kmeans.pxd
- python/cuvs/cuvs/cluster/mg/kmeans/kmeans.pyx
Closes #2149 and #2155