Conversation
Replace the 3 monolithic ternip SRAMs with 18 smaller banks (sharing 2 new macro types) and update the pipelined_mem wrapper to instantiate them with proper bank-select / data-slice logic: vector_registers.pipelined_mem (16 x 4096) -> 8 x fakeram7_512x16 depth-banked tmatmul.exportvector (16 x 1024) -> 2 x fakeram7_512x16 depth-banked tmatmul.importvector (1024 x 16) -> 8 x fakeram7_16x128 width-banked Both new macros are generated by designs/src/ternip/dev/gen_fakeram.py; the SRAM_SIZES list is updated accordingly. The depth-banked design uses bank_sel = request_addr[high bits] to one-hot the ce_in lines, and muxes rd_out using request_addr_q2 (aligned with the SRAM's 1-cycle read latency). The width-banked design slices request_w_data and concatenates rd_out across banks. Macro count goes 3 -> 18 (8 + 2 + 8), total macro area goes 60_109 -> 49_438 um^2 (-18%), per-macro pin count drops by 8-16x compared to the 1024-wide importvector. Drop macro_placement.tcl -- with 18 macros, hand placement is no longer practical; let RTLMP handle it. CORE_UTILIZATION 45 -> 30 (more breathing room for distributed macros), MACRO_PLACE_HALO 20 -> 8. Result: DRT plateaus at ~3500 Lef58EolKeepOut M4 violations -- WORSE than the prior commit's 1780 with 3 hand-placed macros. Banking trades one big macro perimeter for many small perimeters; each perimeter is a region where signals escape M4 pins and immediately collide with the macro's M4 power straps, so 18 macros = 18 hot zones vs 3. Diagnostic finding (motivating the next change, not in this commit): gen_fakeram.py's M4 power grid is 8x denser than the working asap7 fakeram macros used by liteeth and NVDLA (supply_track_pitch 0.048 um vs 0.384 um). Our pattern leaves ~0 free M4 routing tracks between adjacent VDD/VSS straps; liteeth leaves ~6. That's why DRT can never legalize the M4 signal escapes -- not the layout, not the halo, not the banking, but the macro power-grid density itself.
… importvector DRC drops from 1780 → 693 (61% reduction). Three changes working together: 1. importvector (1024 × 16) replaced with flip-flops. Only 16 entries — too small to justify a macro, and the 8 × fakeram7_16x128 width-banked variant contributed ~half the M4 Lef58EolKeepOut perimeter. ram_style="registers" prevents yosys from inferring it as $mem, but yosys still emits the cell, so SYNTH_MEMORY_MAX_BITS is bumped to 16384. 2. macro_placement.tcl hand-places the remaining 10 fakeram7_512x16 macros in two cnn-style MY+R0 arrays with pin edges facing each other across a 15 µm channel. 8 banks of vector_registers in a 2 × 4 array at bottom-left, 2 banks of exportvector above it. 3. CORE_UTILIZATION raised 30 → 45; MACRO_PLACE_HALO 8 → 4. The hand placement keeps macros away from each other, so the wide halo no longer carries its own weight. Residual 693 violations are still Lef58EolKeepOut on M4 — the bsg-fakeram supply_track_pitch (0.048 µm, 8× denser than liteeth / NVDLA at 0.384 µm) leaves no free M4 routing tracks between the power straps. That root-cause fix is tracked separately.
Three coupled bugs in the M4 supply-rail layout, fixed together: 1. supply_track_pitch was 0.048 µm (the M4 routing-track grid itself), placing a power rail edge on every M4 track. Any M4 signal escape from a macro pin had a rail edge inside its Lef58 end-of-line keepout. Bumped to 0.384 µm (alternating VDD/VSS) and added a supply_rail_w of 0.096 µm, matching the working liteeth / NVDLA asap7 fakerams (8× sparser, 4× thicker). 2. Supply rails were laid at x ∈ [snap_w, w−snap_w] = [0.054, w−0.054]. Signal pins protrude from the macro edge to x = pin_protrusion = 0.072, so the rail's left edge (0.054) was inside the pin's x-range, creating a hard M4 short between VDD/VSS and every signal pin whose y happened to fall in a rail band. The wider rails from #1 turned the previously-narrow rails' inherent edge-abutment (geometrically tangent in y) into a true area overlap. Fix: introduce supply_margin_xy = pin_protrusion + lr_pitch (= 0.120 µm), leaving a 0.048 µm clean gap between pin tip and rail, matching liteeth's geometry exactly. 3. supply_rail_w wasn't exposed as a parameter — it fell back to pin_w (0.024 µm), which gave 4×-too-thin rails that aliased with the routing grid and made #1 worse. Net effect on ternip asap7: 693 → 0 DRC violations across the same 10-macro hand-placed layout, ~1 hour DRT. Slack +3243 ps, util 45.5%, power 18.6 mW unchanged.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Follow-up to #125. Brings ternip asap7 from 1780 → 0 DRC across three coordinated changes:
What changed
importvectorwith flip-flops (only 16 entries)gen_fakeram.pyM4 supply-rail geometryThe supply-rail fix (cf33c2d)
gen_fakeram.pyproduced macros that fundamentally couldn't route at zero DRC because of three coupled bugs in the M4 supply-rail layout:supply_track_pitchwas 0.048 µm — the M4 routing-track grid itself. Every M4 track had a power rail edge on it, so any signal pin escape had a rail edge inside its Lef58 end-of-line keepout zone. Fixed to 0.384 µm (alternating VDD/VSS).supply_rail_w— fell back topin_w(0.024 µm), giving 4×-too-thin rails that aliased to the routing grid. Set to 0.096 µm explicitly.supply_margin_xy = pin_protrusion + lr_pitch(0.120 µm) for a clean 0.048 µm gap.All three values now match the working liteeth/NVDLA asap7 fakerams.
Final QoR (cf33c2d)
DRT walked from 23,479 violations (iter 0) → 0 (iter ~30) on a 10-macro hand-placed layout; total flow ~56 min.
Other workarounds kept
SKIP_CTS_REPAIR_TIMING=1— the ODB-1200 split-load bug fires during CTS even with the new layout; same workaround as gemmini / NyuziProcessor / bp_quad.SYNTH_MEMORY_MAX_BITS=16384— needed because yosys still emits a $mem cell for the FF-implemented importvector (ram_style="registers"prevents inference but not the cell).