Skip to content

Remove pinned host memory from barrier solver#1321

Draft
rg20 wants to merge 1 commit into
NVIDIA:release/26.06from
rg20:remove_pinned_memory
Draft

Remove pinned host memory from barrier solver#1321
rg20 wants to merge 1 commit into
NVIDIA:release/26.06from
rg20:remove_pinned_memory

Conversation

@rg20
Copy link
Copy Markdown
Contributor

@rg20 rg20 commented May 28, 2026

Replace all pinned_dense_vector_t members in iteration_data_t with plain dense_vector_t, eliminating CPU<->GPU synchronization overhead from page-locked memory allocation. Removes 169 net lines.

Vectors removed (pinned -> plain or deleted entirely):

  • 10 direction vectors (dw_aff, dx_aff, dy_aff, dv_aff, dz_aff and their corrector counterparts)
  • 5 RHS vectors (primal_rhs, bound_rhs, dual_rhs, complementarity_xz_rhs, complementarity_wv_rhs)
  • 5 residual vectors (primal_residual, bound_residual, dual_residual, complementarity_xz_residual, complementarity_wv_residual)
  • diag, inv_diag, inv_sqrt_diag (CPU-only, converted to dense_vector_t)
  • c, b (constants, converted; permanent d_b_ added to avoid per-iteration device_copy in compute_primal_dual_objective)
  • restrict_u_ (converted; permanent d_restrict_u_ added, copied once)
  • w, x, y, v, z, upper_bounds (state vectors, converted)

Also removes the CPU compute_residuals function entirely (replaced by gpu_compute_residuals path) and simplifies gpu_compute_search_direction signature by removing unused pinned vector parameters.

Validated on 179 benchmark problems (portfolio/maros/qplib): identical results vs baseline under --cudss-deterministic true.

Description

Issue

Checklist

  • I am familiar with the Contributing Guidelines.
  • Testing
    • New or existing tests cover these changes
    • Added tests
    • Created an issue to follow-up
    • NA
  • Documentation
    • The documentation is up to date with these changes
    • Added new documentation
    • NA

Replace all pinned_dense_vector_t members in iteration_data_t with plain
dense_vector_t, eliminating CPU<->GPU synchronization overhead from
page-locked memory allocation. Removes 169 net lines.

Vectors removed (pinned -> plain or deleted entirely):
- 10 direction vectors (dw_aff, dx_aff, dy_aff, dv_aff, dz_aff and
  their corrector counterparts)
- 5 RHS vectors (primal_rhs, bound_rhs, dual_rhs,
  complementarity_xz_rhs, complementarity_wv_rhs)
- 5 residual vectors (primal_residual, bound_residual, dual_residual,
  complementarity_xz_residual, complementarity_wv_residual)
- diag, inv_diag, inv_sqrt_diag (CPU-only, converted to dense_vector_t)
- c, b (constants, converted; permanent d_b_ added to avoid
  per-iteration device_copy in compute_primal_dual_objective)
- restrict_u_ (converted; permanent d_restrict_u_ added, copied once)
- w, x, y, v, z, upper_bounds (state vectors, converted)

Also removes the CPU compute_residuals function entirely (replaced by
gpu_compute_residuals path) and simplifies gpu_compute_search_direction
signature by removing unused pinned vector parameters.

Validated on 179 benchmark problems (portfolio/maros/qplib): identical
results vs baseline under --cudss-deterministic true.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@rg20 rg20 requested a review from a team as a code owner May 28, 2026 19:52
@rg20 rg20 requested review from akifcorduk and hlinsen May 28, 2026 19:52
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 28, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@rg20 rg20 marked this pull request as draft May 28, 2026 19:52
@yuwenchen95
Copy link
Copy Markdown
Contributor

Would this be with release/26.06 or postponed to the next release?

@rg20 rg20 changed the base branch from main to release/26.06 May 29, 2026 15:17
@rg20 rg20 added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels May 29, 2026
@rg20 rg20 added this to the 26.06 milestone May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants