Skip to content

gpl: parallelize updateGradients and nesterovUpdateCoordinates#10451

Open
maliberty wants to merge 2 commits into
The-OpenROAD-Project:masterfrom
The-OpenROAD-Project-staging:gpl-perf
Open

gpl: parallelize updateGradients and nesterovUpdateCoordinates#10451
maliberty wants to merge 2 commits into
The-OpenROAD-Project:masterfrom
The-OpenROAD-Project-staging:gpl-perf

Conversation

@maliberty
Copy link
Copy Markdown
Member

Summary

Restores OpenMP parallelism to two hot per-iteration loops in NesterovBase, with bit-identical output preserved.

  1. updateGradients (nesterovBase.cpp) — the OMP pragma had been disabled with a TODO explaining that reduction(+: ...) over floats produced non-deterministic sums. The serial loop has been the dominant per-iteration cost ever since. Replaced with a two-phase pattern: parallel per-cell write of gradients + preconditioner into the existing output vectors, then a serial in-order accumulation that preserves the original sum += fabs(x); sum += fabs(y); order. Result is bit-identical to the prior serial version regardless of thread count.
  2. nesterovUpdateCoordinates (nesterovBase.cpp) — the per-cell axpy + layout-clamp loop was missing #pragma omp parallel for even though every iteration writes to distinct indices and uses only const helpers. Added the pragma. Bit-identical (independent writes, no reduction).

Type of Change

  • Refactoring

Impact

2.0× wall-time speedup on large01 (274,700 cells) at 64 threads. Placement output is unchanged: every gpl ctest passes against its committed golden DEF, and benchmark runs at 1 thread, 8 threads, and 64 threads all produce identical HPWL.

large01.tcl, 64 threads, 3 runs per condition:

Stage Run 1 (s) Run 2 (s) Run 3 (s) Mean (s) vs master
master 139.61 138.91 137.69 138.74
+grad OMP (#1) 76.51 77.95 72.90 75.79 1.83× / −45.4%
+coordi OMP (#2) 69.93 69.53 69.15 69.54 2.00× / −49.9%

Final HPWL = 12,050,134,762 across all 9 runs — placement is bit-identical at every stage.

Verification

  • I have verified that the local build succeeds (./etc/Build.sh).
  • I have run the relevant tests and they pass.
  • My code follows the repository's formatting guidelines.
  • I have signed my commits (DCO).

Full gpl ctest suite: 62/62 passing. Each test diff-compares against a committed golden DEF, so the existing tests are themselves the bit-identical regression for this change. Additional ad-hoc determinism check: convergence01.tcl produces a DEF byte-for-byte identical to the committed golden at 1 thread and at 8 threads after each commit.

Related Issues

None.

maliberty added 2 commits May 17, 2026 06:32
The OpenMP parallel-for in NesterovBase::updateGradients was disabled
because reduction(+: ...) over floats has an unspecified combine order
across threads, yielding bit-different sums between runs and between
thread counts. Tests rely on golden DEFs, so the loop ran serially.

Split the body in two:

  1. Parallel per-cell phase computing wireLengthGrads[i], densityGrads[i],
     and sumGrads[i] (independent writes to distinct indices). This is the
     expensive work — WA gradient, density gradient, preconditioner.
  2. Serial in-order phase accumulating wireLengthGradSum_,
     densityGradSum_, and gradSum. Order matches the original loop, so
     output is bit-identical to the prior serial version regardless of
     thread count.

Verified by running convergence01.tcl at 1 and 8 threads and diffing the
resulting DEFs against each other and the committed golden — all three
identical. Full gpl ctest suite (62 tests) passes.

Signed-off-by: Matt Liberty <mliberty@precisioninno.com>
The axpy-and-clamp loop at the top of NesterovBase::nesterovUpdateCoordinates
was serial. Each iteration writes to nextCoordi_[k] and nextSLPCoordi_[k] at
distinct indices, with no cross-iteration dependencies and no float reduction,
so it's trivially parallel. The clamping helpers
(getDensityCoordiLayoutInsideX/Y) are const and read only bin-grid bounds.

Output is bit-identical to the serial version regardless of thread count:
convergence01.tcl produces the same DEF at 1 and 8 threads, matching the
committed golden. Full gpl ctest suite (62 tests) passes.

The loop runs once per backtracking retry inside doBackTracking, scaling
linearly with cell count, so the win grows with design size.

Signed-off-by: Matt Liberty <mliberty@precisioninno.com>
@maliberty maliberty requested a review from a team as a code owner May 17, 2026 17:28
@maliberty maliberty requested a review from gudeh May 17, 2026 17:28
@maliberty maliberty self-assigned this May 17, 2026
@github-actions
Copy link
Copy Markdown
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the gradient update logic in NesterovBase::updateGradients into a two-phase process—parallel computation followed by serial reduction—to ensure deterministic results across different thread counts. Additionally, it parallelizes the coordinate update loop in nesterovUpdateCoordinates. I have no feedback to provide as the review comments were purely explanatory or did not identify actionable issues.

@maliberty
Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Already looking forward to the next diff.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant