ternip asap7: hand-placed banked SRAMs + gen_fakeram supply-rail fix → 0 DRC by mguthaus · Pull Request #141 · VLSIDA/HighTide

mguthaus · 2026-05-13T20:18:13Z

Follow-up to #125. Brings ternip asap7 from 1780 → 0 DRC across three coordinated changes:

What changed

Commit	Change	DRC
`576d1c2`	Bank the three large SRAMs into 18 smaller macros; let RTLMP place	~3500
`b1a0dfa`	Hand-place 10 banked SRAMs in cnn-style MY+R0 arrays; replace 16×1024 `importvector` with flip-flops (only 16 entries)	693
`cf33c2d`	Fix three coupled bugs in `gen_fakeram.py` M4 supply-rail geometry	0

The supply-rail fix (`cf33c2d`)

gen_fakeram.py produced macros that fundamentally couldn't route at zero DRC because of three coupled bugs in the M4 supply-rail layout:

supply_track_pitch was 0.048 µm — the M4 routing-track grid itself. Every M4 track had a power rail edge on it, so any signal pin escape had a rail edge inside its Lef58 end-of-line keepout zone. Fixed to 0.384 µm (alternating VDD/VSS).
No explicit supply_rail_w — fell back to pin_w (0.024 µm), giving 4×-too-thin rails that aliased to the routing grid. Set to 0.096 µm explicitly.
Supply rails extended into pin-protrusion x-range — rails spanned x ∈ [0.054, w−0.054] while signal pins protrude to x = 0.072, creating a hard M4 short. Fix introduces supply_margin_xy = pin_protrusion + lr_pitch (0.120 µm) for a clean 0.048 µm gap.

All three values now match the working liteeth/NVDLA asap7 fakerams.

Final QoR (`cf33c2d`)

Metric	Value
DRC violations	0
Die area	131,461 µm²
Utilization	45.5%
Cells (seq + comb + buf/inv)	131,041
Macros	10
IOs	563
Slack	+3243 ps
Skew	96.25 ps
Fmax	0.569 GHz
Power	18.6 mW (clock 7.36 mW)

DRT walked from 23,479 violations (iter 0) → 0 (iter ~30) on a 10-macro hand-placed layout; total flow ~56 min.

Other workarounds kept

SKIP_CTS_REPAIR_TIMING=1 — the ODB-1200 split-load bug fires during CTS even with the new layout; same workaround as gemmini / NyuziProcessor / bp_quad.

SYNTH_MEMORY_MAX_BITS=16384 — needed because yosys still emits a $mem cell for the FF-implemented importvector (ram_style="registers" prevents inference but not the cell).

Replace the 3 monolithic ternip SRAMs with 18 smaller banks (sharing 2 new macro types) and update the pipelined_mem wrapper to instantiate them with proper bank-select / data-slice logic: vector_registers.pipelined_mem (16 x 4096) -> 8 x fakeram7_512x16 depth-banked tmatmul.exportvector (16 x 1024) -> 2 x fakeram7_512x16 depth-banked tmatmul.importvector (1024 x 16) -> 8 x fakeram7_16x128 width-banked Both new macros are generated by designs/src/ternip/dev/gen_fakeram.py; the SRAM_SIZES list is updated accordingly. The depth-banked design uses bank_sel = request_addr[high bits] to one-hot the ce_in lines, and muxes rd_out using request_addr_q2 (aligned with the SRAM's 1-cycle read latency). The width-banked design slices request_w_data and concatenates rd_out across banks. Macro count goes 3 -> 18 (8 + 2 + 8), total macro area goes 60_109 -> 49_438 um^2 (-18%), per-macro pin count drops by 8-16x compared to the 1024-wide importvector. Drop macro_placement.tcl -- with 18 macros, hand placement is no longer practical; let RTLMP handle it. CORE_UTILIZATION 45 -> 30 (more breathing room for distributed macros), MACRO_PLACE_HALO 20 -> 8. Result: DRT plateaus at ~3500 Lef58EolKeepOut M4 violations -- WORSE than the prior commit's 1780 with 3 hand-placed macros. Banking trades one big macro perimeter for many small perimeters; each perimeter is a region where signals escape M4 pins and immediately collide with the macro's M4 power straps, so 18 macros = 18 hot zones vs 3. Diagnostic finding (motivating the next change, not in this commit): gen_fakeram.py's M4 power grid is 8x denser than the working asap7 fakeram macros used by liteeth and NVDLA (supply_track_pitch 0.048 um vs 0.384 um). Our pattern leaves ~0 free M4 routing tracks between adjacent VDD/VSS straps; liteeth leaves ~6. That's why DRT can never legalize the M4 signal escapes -- not the layout, not the halo, not the banking, but the macro power-grid density itself.

… importvector DRC drops from 1780 → 693 (61% reduction). Three changes working together: 1. importvector (1024 × 16) replaced with flip-flops. Only 16 entries — too small to justify a macro, and the 8 × fakeram7_16x128 width-banked variant contributed ~half the M4 Lef58EolKeepOut perimeter. ram_style="registers" prevents yosys from inferring it as $mem, but yosys still emits the cell, so SYNTH_MEMORY_MAX_BITS is bumped to 16384. 2. macro_placement.tcl hand-places the remaining 10 fakeram7_512x16 macros in two cnn-style MY+R0 arrays with pin edges facing each other across a 15 µm channel. 8 banks of vector_registers in a 2 × 4 array at bottom-left, 2 banks of exportvector above it. 3. CORE_UTILIZATION raised 30 → 45; MACRO_PLACE_HALO 8 → 4. The hand placement keeps macros away from each other, so the wide halo no longer carries its own weight. Residual 693 violations are still Lef58EolKeepOut on M4 — the bsg-fakeram supply_track_pitch (0.048 µm, 8× denser than liteeth / NVDLA at 0.384 µm) leaves no free M4 routing tracks between the power straps. That root-cause fix is tracked separately.

Three coupled bugs in the M4 supply-rail layout, fixed together: 1. supply_track_pitch was 0.048 µm (the M4 routing-track grid itself), placing a power rail edge on every M4 track. Any M4 signal escape from a macro pin had a rail edge inside its Lef58 end-of-line keepout. Bumped to 0.384 µm (alternating VDD/VSS) and added a supply_rail_w of 0.096 µm, matching the working liteeth / NVDLA asap7 fakerams (8× sparser, 4× thicker). 2. Supply rails were laid at x ∈ [snap_w, w−snap_w] = [0.054, w−0.054]. Signal pins protrude from the macro edge to x = pin_protrusion = 0.072, so the rail's left edge (0.054) was inside the pin's x-range, creating a hard M4 short between VDD/VSS and every signal pin whose y happened to fall in a rail band. The wider rails from #1 turned the previously-narrow rails' inherent edge-abutment (geometrically tangent in y) into a true area overlap. Fix: introduce supply_margin_xy = pin_protrusion + lr_pitch (= 0.120 µm), leaving a 0.048 µm clean gap between pin tip and rail, matching liteeth's geometry exactly. 3. supply_rail_w wasn't exposed as a parameter — it fell back to pin_w (0.024 µm), which gave 4×-too-thin rails that aliased with the routing grid and made #1 worse. Net effect on ternip asap7: 693 → 0 DRC violations across the same 10-macro hand-placed layout, ~1 hour DRT. Slack +3243 ps, util 45.5%, power 18.6 mW unchanged.

mguthaus added 3 commits May 12, 2026 10:23

mguthaus merged commit f33e702 into main May 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ternip asap7: hand-placed banked SRAMs + gen_fakeram supply-rail fix → 0 DRC#141

ternip asap7: hand-placed banked SRAMs + gen_fakeram supply-rail fix → 0 DRC#141
mguthaus merged 3 commits into
mainfrom
ternip

mguthaus commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mguthaus commented May 13, 2026

What changed

The supply-rail fix (cf33c2d)

Final QoR (cf33c2d)

Other workarounds kept

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

The supply-rail fix (`cf33c2d`)

Final QoR (`cf33c2d`)