Skip to content

Remove CUDA whole compilation programming model violations#1347

Open
robertmaynard wants to merge 1 commit into
NVIDIA:mainfrom
robertmaynard:remove_CUDA_whole_compilation_violations
Open

Remove CUDA whole compilation programming model violations#1347
robertmaynard wants to merge 1 commit into
NVIDIA:mainfrom
robertmaynard:remove_CUDA_whole_compilation_violations

Conversation

@robertmaynard
Copy link
Copy Markdown
Contributor

Previously we tried to launch kernels from files like feasibility_jump that are compiled into other files ( e.g. feasibility_jump_kernels ). This isn't allowed by the CUDA whole compilation programming model.

We remove the static-global-template-stub=false flag to enforce valid CUDA program going forward.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 29, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR centralizes CUDA kernel-dimension calculation and launches into templated host helpers (get_launch_dims_* and launch_*) for feasibility-jump and routing modules, updates call sites to use packed void** args and RAFT_CUDA_TRY wrappers, and adjusts CMake include exports for generated headers.

Changes

Kernel Launcher Infrastructure Modernization

Layer / File(s) Summary
Build configuration and kernel launcher API contracts
cpp/CMakeLists.txt, cpp/src/mip_heuristics/feasibility_jump/feasibility_jump_kernels.cuh, cpp/src/mip_heuristics/feasibility_jump/feasibility_jump_kernels.cuh (pragma once), cpp/src/routing/ges/compute_fragment_ejections.cuh
Removes direct CUDA-flag mutations from CMake and updates cuopt exported include paths; header files add templated get_launch_dims_* and launch_* declarations for feasibility-jump and routing helpers.
Feasibility-jump launcher implementation and instantiation
cpp/src/mip_heuristics/feasibility_jump/feasibility_jump_kernels.cu
Adds occupancy-based get_launch_dims_* helpers and launch_* wrapper implementations (some using cooperative launches) and extends the explicit-instantiation macro to instantiate the new helpers for int/float/double combinations.
Feasibility-jump launch dimension computation refactoring
cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cu (lines 68–96)
Replaces generic occupancy call with kernel-specific get_launch_dims_*_kernel helpers for assignment, MTM resets, local-minimum handling, load-balancing, and related stages.
Feasibility-jump kernel dispatch modernization
cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cu (lines 437–912)
Refactors initialization, load-balancing CSR setup, scoring, MTM move resets, variable selection, local-minimum handling, assignment/constraint updates, LHS/violation refresh, and best-solution updates to use launch_* wrappers with pre-packed void** argument arrays instead of raw <<<>>> or direct cooperative launches.
Routing insertion-ejection launcher infrastructure
cpp/src/routing/ges/compute_fragment_ejections.cu, cpp/src/routing/ges/compute_fragment_ejections.cuh, cpp/src/routing/ges/execute_insertion.cu
Adds set_shmem_for_kernel_get_best_insertion_ejection_solution and launch_kernel_get_best_insertion_ejection_solution helpers with explicit instantiation for BLOCK_SIZE ∈ {32,64,128,512} and request types {PDP,VRP}; refactors execute_best_insertion_ejection_solution to prepare args and call the launcher wrapper.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

non-breaking, improvement

Suggested reviewers

  • akifcorduk
  • rgsl888prabhu
  • nguidotti
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely describes the main change: removing CUDA whole compilation programming model violations by eliminating problematic compiler flags and refactoring kernel launches.
Description check ✅ Passed The description is directly related to the changeset, explaining the CUDA whole compilation model violation and the removal of the problematic flag as the core solution.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
cpp/src/mip_heuristics/feasibility_jump/feasibility_jump_kernels.cuh (1)

1-8: ⚡ Quick win

Add header guard to prevent multiple-inclusion issues.

This .cuh header file lacks include guards (#pragma once or #ifndef/#define), which could lead to redefinition errors if it's included multiple times across translation units. As per coding guidelines, C++ headers (including .cuh CUDA headers) must use header guards.

🛡️ Proposed fix

Add at the beginning of the file (after the license header, before line 8):

 /* clang-format on */
 
+#pragma once
+
 `#include` "feasibility_jump.cuh"

As per coding guidelines: "Use #define guards on C++ headers"

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/src/mip_heuristics/feasibility_jump/feasibility_jump_kernels.cuh` around
lines 1 - 8, Add a header guard to prevent multiple-inclusion for this CUDA
header: wrap the entire contents of feasibility_jump_kernels.cuh with a
conventional include guard macro (e.g., FEASIBILITY_JUMP_KERNELS_CUH) using
`#ifndef/`#define at the top (after the license comment) and a matching `#endif` at
the end, or alternatively add a single `#pragma` once at the top; ensure the guard
name is unique and consistent throughout the file so symbols declared in this
header (including the included "feasibility_jump.cuh") are not redefined when
the header is included multiple times.
cpp/src/routing/ges/compute_fragment_ejections.cu (1)

140-163: ⚡ Quick win

Prefix the macro with CUOPT_ to satisfy project macro naming.

INSTANTIATE_GET_BEST_INSERTION_EJECTION should be renamed with the required CUOPT_ prefix (and updated in #undef + call sites).

Suggested change
-#define INSTANTIATE_GET_BEST_INSERTION_EJECTION(BLOCK_SIZE, REQ)                                  \
+#define CUOPT_INSTANTIATE_GET_BEST_INSERTION_EJECTION(BLOCK_SIZE, REQ)                            \
   template bool set_shmem_for_kernel_get_best_insertion_ejection_solution<BLOCK_SIZE,             \
                                                                           int,                    \
                                                                           float,                  \
                                                                           request_t::REQ>(        \
     size_t dynamic_shmem_size);                                                                   \
   template void                                                                                   \
   launch_kernel_get_best_insertion_ejection_solution<BLOCK_SIZE, int, float, request_t::REQ>(     \
     dim3 grid,                                                                                    \
     dim3 blocks,                                                                                  \
     size_t shmem_bytes,                                                                           \
     void** kernel_args,                                                                           \
     rmm::cuda_stream_view stream);

-INSTANTIATE_GET_BEST_INSERTION_EJECTION(32, PDP)
-INSTANTIATE_GET_BEST_INSERTION_EJECTION(64, PDP)
-INSTANTIATE_GET_BEST_INSERTION_EJECTION(128, PDP)
-INSTANTIATE_GET_BEST_INSERTION_EJECTION(512, PDP)
-INSTANTIATE_GET_BEST_INSERTION_EJECTION(32, VRP)
-INSTANTIATE_GET_BEST_INSERTION_EJECTION(64, VRP)
-INSTANTIATE_GET_BEST_INSERTION_EJECTION(128, VRP)
-INSTANTIATE_GET_BEST_INSERTION_EJECTION(512, VRP)
+CUOPT_INSTANTIATE_GET_BEST_INSERTION_EJECTION(32, PDP)
+CUOPT_INSTANTIATE_GET_BEST_INSERTION_EJECTION(64, PDP)
+CUOPT_INSTANTIATE_GET_BEST_INSERTION_EJECTION(128, PDP)
+CUOPT_INSTANTIATE_GET_BEST_INSERTION_EJECTION(512, PDP)
+CUOPT_INSTANTIATE_GET_BEST_INSERTION_EJECTION(32, VRP)
+CUOPT_INSTANTIATE_GET_BEST_INSERTION_EJECTION(64, VRP)
+CUOPT_INSTANTIATE_GET_BEST_INSERTION_EJECTION(128, VRP)
+CUOPT_INSTANTIATE_GET_BEST_INSERTION_EJECTION(512, VRP)

-#undef INSTANTIATE_GET_BEST_INSERTION_EJECTION
+#undef CUOPT_INSTANTIATE_GET_BEST_INSERTION_EJECTION

As per coding guidelines: "Project macros must use SCREAMING_SNAKE_CASE with CUOPT_ prefix."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/src/routing/ges/compute_fragment_ejections.cu` around lines 140 - 163,
Rename the macro INSTANTIATE_GET_BEST_INSERTION_EJECTION to follow project
naming by prefixing it with CUOPT_, update all call sites and the corresponding
`#undef` to CUOPT_INSTANTIATE_GET_BEST_INSERTION_EJECTION, and ensure the
instantiations (e.g., CUOPT_INSTANTIATE_GET_BEST_INSERTION_EJECTION(32, PDP),
etc.) still refer to the same templates
set_shmem_for_kernel_get_best_insertion_ejection_solution and
launch_kernel_get_best_insertion_ejection_solution with request_t::PDP and
request_t::VRP; adjust only the macro identifier and its undef so usages and
generated template instantiations remain unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cu`:
- Around line 755-759: The debug validation path is launching
compute_mtm_moves_kernel<i_t, f_t, MTMMoveType::FJ_MTM_VIOLATED, false> using
the binary-specialized launch dims (grid_resetmoves_bin, blocks_resetmoves_bin)
which were computed for the <..., true> specialization; replace those with the
non-binary occupancy dims (grid_resetmoves, blocks_resetmoves) so the
cooperative/debug launch uses the correct per-specialization grid and block
sizes; update the call site that currently passes
grid_resetmoves_bin/blocks_resetmoves_bin to pass
grid_resetmoves/blocks_resetmoves instead.

In `@cpp/src/routing/ges/compute_fragment_ejections.cu`:
- Around line 131-133: Replace the C-style cast used for the kernel symbol in
the cudaLaunchKernel call with a C++ cast: locate the cudaLaunchKernel
invocation that passes
(void*)kernel_get_best_insertion_ejection_solution<BLOCK_SIZE, i_t, f_t,
REQUEST> and change the cast to use reinterpret_cast<void*> on the
kernel_get_best_insertion_ejection_solution template instantiation; leave the
rest of the cudaLaunchKernel arguments unchanged.

---

Nitpick comments:
In `@cpp/src/mip_heuristics/feasibility_jump/feasibility_jump_kernels.cuh`:
- Around line 1-8: Add a header guard to prevent multiple-inclusion for this
CUDA header: wrap the entire contents of feasibility_jump_kernels.cuh with a
conventional include guard macro (e.g., FEASIBILITY_JUMP_KERNELS_CUH) using
`#ifndef/`#define at the top (after the license comment) and a matching `#endif` at
the end, or alternatively add a single `#pragma` once at the top; ensure the guard
name is unique and consistent throughout the file so symbols declared in this
header (including the included "feasibility_jump.cuh") are not redefined when
the header is included multiple times.

In `@cpp/src/routing/ges/compute_fragment_ejections.cu`:
- Around line 140-163: Rename the macro INSTANTIATE_GET_BEST_INSERTION_EJECTION
to follow project naming by prefixing it with CUOPT_, update all call sites and
the corresponding `#undef` to CUOPT_INSTANTIATE_GET_BEST_INSERTION_EJECTION, and
ensure the instantiations (e.g.,
CUOPT_INSTANTIATE_GET_BEST_INSERTION_EJECTION(32, PDP), etc.) still refer to the
same templates set_shmem_for_kernel_get_best_insertion_ejection_solution and
launch_kernel_get_best_insertion_ejection_solution with request_t::PDP and
request_t::VRP; adjust only the macro identifier and its undef so usages and
generated template instantiations remain unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d28d4dbe-5408-4e42-bacf-eff3c9ce00f1

📥 Commits

Reviewing files that changed from the base of the PR and between 7da0bda and e958f81.

📒 Files selected for processing (7)
  • cpp/CMakeLists.txt
  • cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cu
  • cpp/src/mip_heuristics/feasibility_jump/feasibility_jump_kernels.cu
  • cpp/src/mip_heuristics/feasibility_jump/feasibility_jump_kernels.cuh
  • cpp/src/routing/ges/compute_fragment_ejections.cu
  • cpp/src/routing/ges/compute_fragment_ejections.cuh
  • cpp/src/routing/ges/execute_insertion.cu
💤 Files with no reviewable changes (1)
  • cpp/CMakeLists.txt

Comment thread cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cu
Comment thread cpp/src/routing/ges/compute_fragment_ejections.cu
Previously we tried to launch kernels from `feasibility_jump` that are compiled into `feasibility_jump_kernels` that isn't allowed by the CUDA whole compilation programming model.

We remove the `static-global-template-stub=false` flag
to enforce valid CUDA program going forward.
@robertmaynard robertmaynard force-pushed the remove_CUDA_whole_compilation_violations branch from e958f81 to 84352db Compare May 29, 2026 19:40
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cu (2)

787-790: ⚡ Quick win

Don't discard the helper-computed grid for launch_update_changed_constraints_kernel.

This path computes grid_update_changed_constraints at Line 667-668, then launches with dim3(1) anyway. That bypasses the new helper-based sizing and can collapse a hot-path kernel to single-block execution.

Proposed fix
       launch_update_assignment_kernel<i_t, f_t>(
         grid_setval, blocks_setval, update_assignment_args, climber_stream);
       launch_update_changed_constraints_kernel<i_t, f_t>(
-        dim3(1), blocks_update_changed_constraints, kernel_args, climber_stream);
+        grid_update_changed_constraints,
+        blocks_update_changed_constraints,
+        kernel_args,
+        climber_stream);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cu` around lines 787
- 790, The call to launch_update_changed_constraints_kernel currently forces a
single-block launch (dim3(1)) and ignores the helper-computed grid size
grid_update_changed_constraints; replace the hardcoded dim3(1) with the computed
grid_update_changed_constraints so the kernel uses the helper-based sizing
(i.e., call
launch_update_changed_constraints_kernel<i_t,f_t>(grid_update_changed_constraints,
blocks_update_changed_constraints, kernel_args, climber_stream)), ensuring you
keep the existing blocks_update_changed_constraints and kernel_args parameters.

539-553: ⚡ Quick win

Use the cached work-id mapping launch dims here.

load_balancing_workid_map_launch_dims is already computed in the constructor, but these dispatches still hardcode 4096x128. That leaves this kernel outside the new centralized launch-sizing path and makes future tuning drift between setup and launch.

Proposed fix
+  auto [grid_load_balancing_workid_map, blocks_load_balancing_workid_map] =
+    load_balancing_workid_map_launch_dims;
+
   if (pb_ptr->binary_indices.size() > 0) {
     auto row_size_prefix_sum = view.row_size_bin_prefix_sum;
     auto var_indices         = view.pb.binary_indices;
     auto work_id_to_var_idx  = view.work_id_to_bin_var_idx;
     void* args[]             = {&view, &row_size_prefix_sum, &var_indices, &work_id_to_var_idx};
     launch_load_balancing_compute_workid_mappings<i_t, f_t>(
-      dim3(4096), dim3(128), args, climber_stream);
+      grid_load_balancing_workid_map, blocks_load_balancing_workid_map, args, climber_stream);
   }
   if (pb_ptr->nonbinary_indices.size() > 0) {
     auto row_size_prefix_sum = view.row_size_nonbin_prefix_sum;
     auto var_indices         = view.pb.nonbinary_indices;
     auto work_id_to_var_idx  = view.work_id_to_nonbin_var_idx;
     void* args[]             = {&view, &row_size_prefix_sum, &var_indices, &work_id_to_var_idx};
     launch_load_balancing_compute_workid_mappings<i_t, f_t>(
-      dim3(4096), dim3(128), args, climber_stream);
+      grid_load_balancing_workid_map, blocks_load_balancing_workid_map, args, climber_stream);
   }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cu` around lines 539
- 553, The two kernel launches that call
launch_load_balancing_compute_workid_mappings<i_t,f_t> use hardcoded dim3(4096),
dim3(128); replace those with the cached launch dimensions
load_balancing_workid_map_launch_dims (use its grid and block members) so both
the binary and nonbinary dispatches use
load_balancing_workid_map_launch_dims.grid and
load_balancing_workid_map_launch_dims.block instead of the hardcoded dim3
values, leaving the args and stream unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cu`:
- Around line 787-790: The call to launch_update_changed_constraints_kernel
currently forces a single-block launch (dim3(1)) and ignores the helper-computed
grid size grid_update_changed_constraints; replace the hardcoded dim3(1) with
the computed grid_update_changed_constraints so the kernel uses the helper-based
sizing (i.e., call
launch_update_changed_constraints_kernel<i_t,f_t>(grid_update_changed_constraints,
blocks_update_changed_constraints, kernel_args, climber_stream)), ensuring you
keep the existing blocks_update_changed_constraints and kernel_args parameters.
- Around line 539-553: The two kernel launches that call
launch_load_balancing_compute_workid_mappings<i_t,f_t> use hardcoded dim3(4096),
dim3(128); replace those with the cached launch dimensions
load_balancing_workid_map_launch_dims (use its grid and block members) so both
the binary and nonbinary dispatches use
load_balancing_workid_map_launch_dims.grid and
load_balancing_workid_map_launch_dims.block instead of the hardcoded dim3
values, leaving the args and stream unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 48ce274a-58e6-4355-aa5a-157dcda573f7

📥 Commits

Reviewing files that changed from the base of the PR and between e958f81 and 84352db.

📒 Files selected for processing (7)
  • cpp/CMakeLists.txt
  • cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cu
  • cpp/src/mip_heuristics/feasibility_jump/feasibility_jump_kernels.cu
  • cpp/src/mip_heuristics/feasibility_jump/feasibility_jump_kernels.cuh
  • cpp/src/routing/ges/compute_fragment_ejections.cu
  • cpp/src/routing/ges/compute_fragment_ejections.cuh
  • cpp/src/routing/ges/execute_insertion.cu
💤 Files with no reviewable changes (1)
  • cpp/CMakeLists.txt
🚧 Files skipped from review as they are similar to previous changes (5)
  • cpp/src/routing/ges/compute_fragment_ejections.cu
  • cpp/src/routing/ges/execute_insertion.cu
  • cpp/src/routing/ges/compute_fragment_ejections.cuh
  • cpp/src/mip_heuristics/feasibility_jump/feasibility_jump_kernels.cu
  • cpp/src/mip_heuristics/feasibility_jump/feasibility_jump_kernels.cuh

Copy link
Copy Markdown
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @robertmaynard!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants