Skip to content

ternip asap7: hand-placed banked SRAMs + gen_fakeram supply-rail fix → 0 DRC#141

Merged
mguthaus merged 3 commits into
mainfrom
ternip
May 13, 2026
Merged

ternip asap7: hand-placed banked SRAMs + gen_fakeram supply-rail fix → 0 DRC#141
mguthaus merged 3 commits into
mainfrom
ternip

Conversation

@mguthaus
Copy link
Copy Markdown
Contributor

Follow-up to #125. Brings ternip asap7 from 1780 → 0 DRC across three coordinated changes:

What changed

Commit Change DRC
576d1c2 Bank the three large SRAMs into 18 smaller macros; let RTLMP place ~3500
b1a0dfa Hand-place 10 banked SRAMs in cnn-style MY+R0 arrays; replace 16×1024 importvector with flip-flops (only 16 entries) 693
cf33c2d Fix three coupled bugs in gen_fakeram.py M4 supply-rail geometry 0

The supply-rail fix (cf33c2d)

gen_fakeram.py produced macros that fundamentally couldn't route at zero DRC because of three coupled bugs in the M4 supply-rail layout:

  1. supply_track_pitch was 0.048 µm — the M4 routing-track grid itself. Every M4 track had a power rail edge on it, so any signal pin escape had a rail edge inside its Lef58 end-of-line keepout zone. Fixed to 0.384 µm (alternating VDD/VSS).
  2. No explicit supply_rail_w — fell back to pin_w (0.024 µm), giving 4×-too-thin rails that aliased to the routing grid. Set to 0.096 µm explicitly.
  3. Supply rails extended into pin-protrusion x-range — rails spanned x ∈ [0.054, w−0.054] while signal pins protrude to x = 0.072, creating a hard M4 short. Fix introduces supply_margin_xy = pin_protrusion + lr_pitch (0.120 µm) for a clean 0.048 µm gap.

All three values now match the working liteeth/NVDLA asap7 fakerams.

Final QoR (cf33c2d)

Metric Value
DRC violations 0
Die area 131,461 µm²
Utilization 45.5%
Cells (seq + comb + buf/inv) 131,041
Macros 10
IOs 563
Slack +3243 ps
Skew 96.25 ps
Fmax 0.569 GHz
Power 18.6 mW (clock 7.36 mW)

DRT walked from 23,479 violations (iter 0) → 0 (iter ~30) on a 10-macro hand-placed layout; total flow ~56 min.

Other workarounds kept

SKIP_CTS_REPAIR_TIMING=1 — the ODB-1200 split-load bug fires during CTS even with the new layout; same workaround as gemmini / NyuziProcessor / bp_quad.

SYNTH_MEMORY_MAX_BITS=16384 — needed because yosys still emits a $mem cell for the FF-implemented importvector (ram_style="registers" prevents inference but not the cell).

mguthaus added 3 commits May 12, 2026 10:23
Replace the 3 monolithic ternip SRAMs with 18 smaller banks (sharing 2
new macro types) and update the pipelined_mem wrapper to instantiate
them with proper bank-select / data-slice logic:

  vector_registers.pipelined_mem (16 x 4096) -> 8 x fakeram7_512x16   depth-banked
  tmatmul.exportvector           (16 x 1024) -> 2 x fakeram7_512x16   depth-banked
  tmatmul.importvector           (1024 x 16) -> 8 x fakeram7_16x128   width-banked

Both new macros are generated by designs/src/ternip/dev/gen_fakeram.py;
the SRAM_SIZES list is updated accordingly.  The depth-banked design
uses bank_sel = request_addr[high bits] to one-hot the ce_in lines,
and muxes rd_out using request_addr_q2 (aligned with the SRAM's 1-cycle
read latency).  The width-banked design slices request_w_data and
concatenates rd_out across banks.

Macro count goes 3 -> 18 (8 + 2 + 8), total macro area goes 60_109 -> 49_438
um^2 (-18%), per-macro pin count drops by 8-16x compared to the
1024-wide importvector.

Drop macro_placement.tcl -- with 18 macros, hand placement is no longer
practical; let RTLMP handle it.  CORE_UTILIZATION 45 -> 30 (more
breathing room for distributed macros), MACRO_PLACE_HALO 20 -> 8.

Result: DRT plateaus at ~3500 Lef58EolKeepOut M4 violations -- WORSE
than the prior commit's 1780 with 3 hand-placed macros.  Banking trades
one big macro perimeter for many small perimeters; each perimeter is a
region where signals escape M4 pins and immediately collide with the
macro's M4 power straps, so 18 macros = 18 hot zones vs 3.

Diagnostic finding (motivating the next change, not in this commit):
gen_fakeram.py's M4 power grid is 8x denser than the working asap7
fakeram macros used by liteeth and NVDLA (supply_track_pitch 0.048 um
vs 0.384 um).  Our pattern leaves ~0 free M4 routing tracks between
adjacent VDD/VSS straps; liteeth leaves ~6.  That's why DRT can never
legalize the M4 signal escapes -- not the layout, not the halo, not the
banking, but the macro power-grid density itself.
… importvector

DRC drops from 1780 → 693 (61% reduction).  Three changes working together:

1. importvector (1024 × 16) replaced with flip-flops.  Only 16 entries —
   too small to justify a macro, and the 8 × fakeram7_16x128 width-banked
   variant contributed ~half the M4 Lef58EolKeepOut perimeter.
   ram_style="registers" prevents yosys from inferring it as $mem, but
   yosys still emits the cell, so SYNTH_MEMORY_MAX_BITS is bumped to 16384.

2. macro_placement.tcl hand-places the remaining 10 fakeram7_512x16
   macros in two cnn-style MY+R0 arrays with pin edges facing each
   other across a 15 µm channel.  8 banks of vector_registers in a
   2 × 4 array at bottom-left, 2 banks of exportvector above it.

3. CORE_UTILIZATION raised 30 → 45; MACRO_PLACE_HALO 8 → 4.  The
   hand placement keeps macros away from each other, so the wide
   halo no longer carries its own weight.

Residual 693 violations are still Lef58EolKeepOut on M4 — the
bsg-fakeram supply_track_pitch (0.048 µm, 8× denser than liteeth /
NVDLA at 0.384 µm) leaves no free M4 routing tracks between the
power straps.  That root-cause fix is tracked separately.
Three coupled bugs in the M4 supply-rail layout, fixed together:

1. supply_track_pitch was 0.048 µm (the M4 routing-track grid itself),
   placing a power rail edge on every M4 track.  Any M4 signal escape
   from a macro pin had a rail edge inside its Lef58 end-of-line
   keepout.  Bumped to 0.384 µm (alternating VDD/VSS) and added a
   supply_rail_w of 0.096 µm, matching the working liteeth / NVDLA
   asap7 fakerams (8× sparser, 4× thicker).

2. Supply rails were laid at x ∈ [snap_w, w−snap_w] = [0.054, w−0.054].
   Signal pins protrude from the macro edge to x = pin_protrusion =
   0.072, so the rail's left edge (0.054) was inside the pin's
   x-range, creating a hard M4 short between VDD/VSS and every signal
   pin whose y happened to fall in a rail band.  The wider rails from
   #1 turned the previously-narrow rails' inherent edge-abutment
   (geometrically tangent in y) into a true area overlap.

   Fix: introduce supply_margin_xy = pin_protrusion + lr_pitch
   (= 0.120 µm), leaving a 0.048 µm clean gap between pin tip and
   rail, matching liteeth's geometry exactly.

3. supply_rail_w wasn't exposed as a parameter — it fell back to
   pin_w (0.024 µm), which gave 4×-too-thin rails that aliased with
   the routing grid and made #1 worse.

Net effect on ternip asap7: 693 → 0 DRC violations across the same
10-macro hand-placed layout, ~1 hour DRT.  Slack +3243 ps, util 45.5%,
power 18.6 mW unchanged.
@mguthaus mguthaus merged commit f33e702 into main May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant