MBFF clock-tree power credit - promote higher banking ratios#10796
MBFF clock-tree power credit - promote higher banking ratios#10796dsengupta0628 wants to merge 12 commits into
Conversation
Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>
Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com> (cherry picked from commit 320a848df4db3d7da0f4f71aa494612a15970a6a)
Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>
Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>
Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a new -clock_power_weight parameter to the multi-bit flip-flop (MBFF) clustering algorithm, allowing users to credit clock-tree power savings and promote higher banking ratios. The changes span the documentation, C++ implementation, TCL interface, and include new integration tests. The feedback suggests adding a defensive check in the C++ MBFF::Run entry point to validate that clock_power_weight is non-negative, preventing potential division-by-zero errors when the function is called directly from other APIs.
| const float beta, | ||
| const float clock_power_weight) | ||
| { | ||
| clock_power_weight_ = clock_power_weight; |
There was a problem hiding this comment.
While clock_power_weight is validated to be non-negative in the TCL interface, it is not validated in the C++ MBFF::Run entry point. If MBFF::Run is called directly (e.g., via Python or other C++ APIs) with a negative clock_power_weight, it can lead to a division by zero in MBFF::SetRatios where single_bit_power_ + p_ct is used as a divisor. Adding a defensive check here using log_->error ensures robust input validation.
if (clock_power_weight < 0.0f) {
log_->error(GPL, 115, "-clock_power_weight must be non-negative.");
}
clock_power_weight_ = clock_power_weight;References
- Calls to
logger_->errorare expected to terminate the program, so additional error handling like returning an error code or throwing an exception is not necessary.
Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>
|
Do we need an opt-in flag for this behavior. To me MBFF's primary benefit is around power reduction and we should do all we can to reduce power of which clock power is one dimension. My worry here is that this flag just makes the default behavior worse. |
The reason it's opt-in rather than on-by-default is that this change doesn't really measure clock-tree power - it models a proxy, P_ct = k · single_bit_power, where k is a hand-tuned weight with no physically-grounded value. That makes a safe default hard to justify:
So the flag isn't a decision to keep clock-power savings off - it's that turning an un-calibrated proxy on by default, while the timing safeguard is also default-off, isn't defensible without per-design justification. My intention with this was to have a code now with k=0 default (safe, reviewable, unblocks customers who need higher banking today), and open a follow-up to make clock-power credit default via some kind of a measured estimate - deriving P_ct from the actual clock tree (buffer library + fanout + RC) instead of a user knob- haven't explored that yet. But since we did see this code helping with the customer banking ratio already, I figured this is a good representation of giving clock power credit instead of actual estimation. |
Summary
Adds an opt-in cluster_flops -clock_power_weight flag that lets the MBFF clustering ILP credit clock-tree power saved by banking, enabling higher banking ratios on designs where the current cost model stalls.
Problem
cluster_flops can't exceed a certain banking on some customer libraries, while commercial tools can reach almost double of that. Higher banking cuts CTS routing load and yields more accurate post-route power.
Root cause: the clustering ILP (MBFF::RunILP) only models in-cell clock-pin sharing. The dominant real benefit- collapsing N clock-tree sinks (buffers + clock-net wire) down to 1 - is external to the cell and uncounted. With the savings understated, the ILP stops banking early. ORFS libs already show a large per-bit tray-power drop so they're unaffected; customer libs with a modest drop gate hard.
One Solution
Model a per-sink clock-tree power P_ct = k · single_bit_power. An N-bit tray presents one sink regardless of N, so banking eliminates (N−1)·P_ct. Fold into per-tray cost, keeping the 1-bit baseline at 1.0:
tray_cost[N-bit] = (tray_total_power + P_ct) / (single_bit_power + P_ct)
tray_cost[1-bit] = 1.0
Single conceptual edit in MBFF::SetRatios; RunILP already reads norm_power_. No change to displacement/timing terms.
Interface
cluster_flops [-clock_power_weight k] # float, k >= 0, default 0.0
Type of Change
Impact
Improves banking ratio of multi bit flop clustering using the new feature
Verification
./etc/Build.sh).