Latest STA changes 06/29#379
Conversation
…lver (#450)
This commit refactors the Dartu-Menezes-Pileggi (DMP) effective capacitance
algorithm solver to use the Eigen library, and implements a Determinant Guarded
solver to handle singular and near-singular Jacobians safely and efficiently.
Key changes:
1. Refactored the DMP solver to use Eigen stack-based data structures,
replacing the custom Crout LU decomposition and solver.
2. Extracted the linear solver logic into a dedicated helper function
DmpAlg::solveNewtonStep.
3. Implemented a "Determinant Guarded" solver:
- Manually checks the determinant of the Jacobian (safety guard).
- Throws a DmpError if the determinant is dangerously close to zero (|det| < 1e-12),
safely triggering the lumped capacitance fallback.
- Otherwise, solves using Eigen's highly optimized analytical inverse (fast path).
4. Added documentation in the comments on how to easily swap back to LU
decomposition with partial pivoting (PartialPivLU) if any numerical
precision issues arise in the future.
Benchmark Results (Optimized Release Build, 5,288 DMP calls):
| Test Case | Calls | 1. Original (Crout LU) | 2. Eigen (PartialPivLU) | 3. Eigen (Determinant Guarded) | Speedup (3 vs 1) | Speedup (3 vs 2) |
| :--- | :---: | :---: | :---: | :---: | :---: | :---: |
| power | 2,608 | 1.8822 ms | 1.6791 ms | 1.2702 ms | +32.5% | +24.4% |
| power_vcd | 2,608 | 1.8729 ms | 1.6704 ms | 1.2768 ms | +31.8% | +23.6% |
| spef_parasitics | 24 | 0.0193 ms | 0.0187 ms | 0.0140 ms | +27.5% | +25.1% |
| mcmm3 | 48 | 0.0728 ms | 0.0756 ms | 0.0817 ms | -12.2% | -8.1% |
| Total DMP Time | 5,288 | 3.8472 ms | 3.4438 ms | 2.6427 ms | +31.3% | +23.3% |
Signed-off-by: James Cherry <cherry@CerezoBook.local>
Signed-off-by: James Cherry <cherry@CerezoBook.local>
Signed-off-by: James Cherry <cherry@CerezoBook.local>
Signed-off-by: James Cherry <cherry@CerezoBook.local>
Signed-off-by: James Cherry <cherry@CerezoBook.local>
Signed-off-by: James Cherry <cherry@CerezoBook.local>
Signed-off-by: James Cherry <cherry@CerezoBook.local>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: James Cherry <cherry@CerezoBook.local>
Signed-off-by: James Cherry <cherry@CerezoBook.local>
Signed-off-by: James Cherry <cherry@CerezoBook.local>
Signed-off-by: James Cherry <cherry@CerezoBook.local>
Signed-off-by: James Cherry <cherry@CerezoBook.local>
Signed-off-by: James Cherry <cherry@CerezoBook.local>
Signed-off-by: James Cherry <cherry@CerezoBook.local>
Signed-off-by: James Cherry <cherry@CerezoBook.local>
Signed-off-by: James Cherry <cherry@CerezoBook.local>
Signed-off-by: James Cherry <cherry@CerezoBook.local>
Signed-off-by: James Cherry <cherry@CerezoBook.local>
Signed-off-by: James Cherry <cherry@CerezoBook.local>
Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>
Signed-off-by: James Cherry <cherry@CerezoBook.local>
Signed-off-by: James Cherry <cherry@CerezoBook.local>
Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>
Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>
|
James Cherry seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
There was a problem hiding this comment.
Code Review
This pull request refactors the Newton-Raphson solver in DmpCeff to use Eigen, optimizes generated clock and latch arrival propagation by postponing latch evaluations, introduces new graph traversal helpers, and ensures input delay reference pin edges survive graph rebuilds. The review feedback identifies several critical issues: a potential stale value bug in DmpCeff when nr_order is not 3, an uninitialized pointer and double-free risk in PathEnumFaninVisitor due to an empty copy constructor, numerical instability in the determinant-based singularity check, potential null pointer dereferences in findDelays and visitFanouts, and a division-by-zero risk in Power::findActivity if the clock period is zero.
| Eigen::Vector3d x = Eigen::Vector3d::Zero(); | ||
| if (nr_order_ == 3) | ||
| x_[DmpParam::ceff] = ceff; | ||
| x[DmpParam::ceff] = ceff; |
There was a problem hiding this comment.
When nr_order_ is not 3 (e.g., nr_order_ == 2 for DmpOnePole), ceff_ is never updated to the input ceff before newtonRaphson(x) is called. Since DmpOnePole::evalDmpEqns relies on ceff_, this leads to stale or uninitialized values being used during the Newton-Raphson iterations. Setting ceff_ = ceff; in the else branch resolves this issue.
| Eigen::Vector3d x = Eigen::Vector3d::Zero(); | |
| if (nr_order_ == 3) | |
| x_[DmpParam::ceff] = ceff; | |
| x[DmpParam::ceff] = ceff; | |
| Eigen::Vector3d x = Eigen::Vector3d::Zero(); | |
| if (nr_order_ == 3) | |
| x[DmpParam::ceff] = ceff; | |
| else | |
| ceff_ = ceff; |
| { | ||
| } | ||
|
|
||
| PathEnumFaninVisitor::~PathEnumFaninVisitor() | ||
| { | ||
| delete pred_; | ||
| } |
There was a problem hiding this comment.
The custom copy constructor PathEnumFaninVisitor::PathEnumFaninVisitor(const PathEnumFaninVisitor &visitor) is empty and does not call the base class copy constructor. This causes the base class to be default-constructed, leaving pred_ uninitialized or null in the copied visitor, which will cause crashes when copy() is called. Additionally, manually deleting pred_ in the destructor will cause a double-free if the base class copy constructor is called and copies the pointer. To fix this robustly, consider passing true for the delete_pred parameter in the PathVisitor constructor so PathVisitor manages the lifetime of pred_, removing the custom destructor entirely, and removing the custom copy constructor entirely to let the compiler generate a correct member-wise copy constructor.
| if (nr_order_ == 2) { | ||
| double det = fjac.topLeftCorner<2, 2>().determinant(); | ||
| if (std::abs(det) < 1e-12) { | ||
| throw DmpError("Jacobian is singular (order 2)"); | ||
| } | ||
| p.head<2>() = fjac.topLeftCorner<2, 2>().inverse() * -fvec.head<2>(); | ||
| return p; | ||
| } | ||
| } | ||
|
|
||
| // Solves fjac_ * x = p_ for x, assuming fjac_ is LU form from luDecomp. | ||
| // Solution overwrites p_. | ||
| void | ||
| DmpAlg::luSolve() | ||
| { | ||
| const int size = nr_order_; | ||
|
|
||
| // Transform p_ allowing for leading zeros. | ||
| int non_zero = -1; | ||
| for (int i = 0; i < size; i++) { | ||
| int iperm = index_[i]; | ||
| double sum = p_[iperm]; | ||
| p_[iperm] = p_[i]; | ||
| if (non_zero != -1) { | ||
| for (int j = non_zero; j <= i - 1; j++) | ||
| sum -= fjac_[i][j] * p_[j]; | ||
| } | ||
| else { | ||
| if (sum != 0.0) | ||
| non_zero = i; | ||
| } | ||
| p_[i] = sum; | ||
| } | ||
| // Backsubstitution. | ||
| for (int i = size - 1; i >= 0; i--) { | ||
| double sum = p_[i]; | ||
| for (int j = i + 1; j < size; j++) | ||
| sum -= fjac_[i][j] * p_[j]; | ||
| p_[i] = sum / fjac_[i][i]; | ||
| double det = fjac.determinant(); | ||
| if (std::abs(det) < 1e-12) { | ||
| throw DmpError("Jacobian is singular (order 3)"); | ||
| } | ||
| p = fjac.inverse() * -fvec; | ||
| return p; | ||
| } |
There was a problem hiding this comment.
The absolute determinant check std::abs(det) < 1e-12 is highly sensitive to the scaling of the matrix elements, which depend on physical units (e.g., Farads and seconds). If the units result in small values in fjac, the determinant can easily fall below 1e-12 even for a well-conditioned matrix, causing unexpected DmpError failures. Using a robust LU decomposition with partial pivoting (as documented in your comments) is much safer and numerically stable.
| if (nr_order_ == 2) { | |
| double det = fjac.topLeftCorner<2, 2>().determinant(); | |
| if (std::abs(det) < 1e-12) { | |
| throw DmpError("Jacobian is singular (order 2)"); | |
| } | |
| p.head<2>() = fjac.topLeftCorner<2, 2>().inverse() * -fvec.head<2>(); | |
| return p; | |
| } | |
| } | |
| // Solves fjac_ * x = p_ for x, assuming fjac_ is LU form from luDecomp. | |
| // Solution overwrites p_. | |
| void | |
| DmpAlg::luSolve() | |
| { | |
| const int size = nr_order_; | |
| // Transform p_ allowing for leading zeros. | |
| int non_zero = -1; | |
| for (int i = 0; i < size; i++) { | |
| int iperm = index_[i]; | |
| double sum = p_[iperm]; | |
| p_[iperm] = p_[i]; | |
| if (non_zero != -1) { | |
| for (int j = non_zero; j <= i - 1; j++) | |
| sum -= fjac_[i][j] * p_[j]; | |
| } | |
| else { | |
| if (sum != 0.0) | |
| non_zero = i; | |
| } | |
| p_[i] = sum; | |
| } | |
| // Backsubstitution. | |
| for (int i = size - 1; i >= 0; i--) { | |
| double sum = p_[i]; | |
| for (int j = i + 1; j < size; j++) | |
| sum -= fjac_[i][j] * p_[j]; | |
| p_[i] = sum / fjac_[i][i]; | |
| double det = fjac.determinant(); | |
| if (std::abs(det) < 1e-12) { | |
| throw DmpError("Jacobian is singular (order 3)"); | |
| } | |
| p = fjac.inverse() * -fvec; | |
| return p; | |
| } | |
| if (nr_order_ == 2) { | |
| auto lu = fjac.topLeftCorner<2, 2>().partialPivLu(); | |
| if (std::abs(lu.matrixLU().diagonal().prod()) < 1e-12) { | |
| throw DmpError("Jacobian is singular (order 2)"); | |
| } | |
| p.head<2>() = lu.solve(-fvec.head<2>()); | |
| return p; | |
| } | |
| auto lu = fjac.partialPivLu(); | |
| if (std::abs(lu.matrixLU().diagonal().prod()) < 1e-12) { | |
| throw DmpError("Jacobian is singular (order 3)"); | |
| } | |
| p = lu.solve(-fvec); | |
| return p; |
| iter_->ensureSize(); | ||
| if (incremental_) | ||
| seedInvalidDelays(); | ||
| Stats stats(debug_, report_); |
There was a problem hiding this comment.
| void | ||
| Graph::visitFanouts(Vertex *vertex, | ||
| SearchPred *pred, | ||
| const VertexFn &fn) | ||
| { | ||
| if (pred->searchFrom(vertex)) { |
There was a problem hiding this comment.
| if (clk) { | ||
| float duty = clockDuty(clk); | ||
| return PwrActivity(2.0 / clk->period(), duty, PwrActivityOrigin::clock); | ||
| } |
There was a problem hiding this comment.
If clk->period() is 0.0, the expression 2.0 / clk->period() will result in a division by zero (producing infinity or NaN). It is safer to guard against a zero period.
if (clk && clk->period() != 0.0) {
float duty = clockDuty(clk);
return PwrActivity(2.0 / clk->period(), duty, PwrActivityOrigin::clock);
}Signed-off-by: James Cherry <cherry@CerezoBook.local>
* make mode/scene pass as tcl objects Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com> * address review comments Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com> * address review comments take 2 Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com> * revert to using cmd mode name Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com> --------- Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>
Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>
|
CI fails because we need OpenROAD side changes to accomodate these: |
Resolved merge conflict in .github/workflows/ci.yml: pinned actions/checkout to v7 (9c091bb21b7c1c1d1991bb908d89e4e9dddfe3e0), keeping SHA-pin style over upstream's @v7 tag.
Replaced deprecated LibertyCell::hasSequentials() with isSequential() (identical impl) at 9 call sites across 5 test files: TestPower.cc, TestLibertyClasses.cc, TestLibertyStaCallbacks.cc, TestLibertyStaBasicsB.cc, TestLibertyStaBasics.cc. Clears all -Wdeprecated-declarations warnings from these files.