Added Ternip by sifferman · Pull Request #125 · VLSIDA/HighTide

sifferman · 2026-05-09T00:24:31Z

I had Claude do all this for me. I've done 0 debugging on my own.

Here is the output of DECISIONS.md:

Ternip

Ternip is a custom fixed-point ternary matrix-multiply inference processor written in SystemVerilog. It requires native SV synthesis via yosys-slang (SYNTH_HDL_FRONTEND: slang) and three FakeRAM macros replacing the behavioral ternip_pipelined_mem module.

asap7

Status: not finishing — detail routing does not converge
Last updated: 2026-05-08

Configuration

SYNTH_HDL_FRONTEND = slang (native SystemVerilog — no sv2v)
SYNTH_HIERARCHICAL = 0 (hierarchical mode caused CTS ODB-1200 InsertBufferBeforeLoads failure)
CORE_UTILIZATION = 25, PLACE_DENSITY = 0.55
MACRO_PLACE_HALO = 12 12
TNS_END_PERCENT = 100
Clock: clk_i, 5000 ps (200 MHz)
CONFIG_FILENAME set via VERILOG_DEFINES; hightide.svh resolved from VERILOG_INCLUDE_DIRS

FakeRAM macros (asap7)

ternip_pipelined_mem is the sole memory primitive, parameterized by DATA_WIDTH and NUM_ENTRIES. Three instances are synthesized; each is replaced by a fakeram7_* macro via ternip_pipelined_mem_fakeram7.v.

Macro	DATA_WIDTH	NUM_ENTRIES	Instance	Source in ternip repo
`fakeram7_4096x16`	16	4096	`vector_registers.pipelined_mem`	`ternip_vector_registers.sv` — `FixedPointPrecision × (D × NumVectorRegisters)` = 16 × 4096
`fakeram7_1024x16`	16	1024	`tmatmul/exportvector`	`fus/ternip_tmatmul.sv` — export vector buffer, `DATA_WIDTH=FixedPointPrecision`, `NUM_ENTRIES=D`
`fakeram7_16x1024`	1024	16	`tmatmul/importvector`	`fus/ternip_tmatmul.sv` — import vector buffer, `DATA_WIDTH=D×FixedPointPrecision/TmatmulParallelism`, `NUM_ENTRIES=TmatmulParallelism`

With D=1024, FixedPointPrecision=16, TmatmulParallelism=64: importvector DATA_WIDTH = 1024×16/64 = 256... see note below.

LEF/LIB files generated by designs/src/ternip/dev/gen_fakeram.py --platform asap7. Macro geometry targets a 2:1 aspect ratio; pin pitch matches bsg_fakeram's proven asap7 format (M4, 0.144 µm pitch, 0.072 µm protrusion).

Floorplan — macro placement

Die: 514.9 × 514.9 µm at 25% utilization. RTLMP places all three macros automatically:

Macro	Instance	Origin (x, y) µm	Orient	Size (w × h) µm
`fakeram7_4096x16`	`vector_registers.pipelined_mem`	13.0, 101.3	R0	256.0 × 128.3
`fakeram7_1024x16`	`tmatmul/exportvector`	141.1, 77.3	R180	128.0 × 64.3
`fakeram7_16x1024`	`tmatmul/importvector`	501.9, 161.9	R180	128.0 × 148.8

Detail routing — convergence failure

Global routing passes cleanly (0 overflow, 1.79% resource usage). Detail routing does not converge; the router reaches the 50-iteration limit with ~4,150 eolKeepOut violations remaining.

Selected per-iteration violation counts:

Iteration	Total violations	eolKeepOut
0	13,992	~13,992
1	5,225	~5,225
3	4,322	~4,322
4	4,275	4,150
10	4,222	4,160
16	4,184	4,162
24	4,155	4,150
45	4,151	4,150
47	4,147	4,146
50	~4,204	~4,150

The count drops sharply in iterations 0–3 (general routing cleanup), then plateaus at ~4,150 eolKeepOut violations from iteration 4 onward with no further improvement.

Root cause: fakeram7_16x1024 has 2 × 1024 data pins + 4 address pins + 3 control pins = 2055 signal pins at 0.144 µm pitch on a 148.8 µm-tall body. The macro sits in the upper-right corner of the die (x = 502 µm in a 515 µm-wide die) in R180 orientation. The resulting pin clusters at the macro edges create a local routing hot spot that the detail router cannot escape — every attempted reroute around one eolKeepOut violation displaces another.

Global routing sees no overflow because the congestion is localized to the pin-access layer directly adjacent to the macro edge; the global router operates at a coarser granularity and does not model per-pin eolKeepOut constraints.

Open fix

Increase pin_track_count from 3 to 6 in gen_fakeram.py for fakeram7_16x1024 (doubling the pin pitch from 0.144 µm to 0.288 µm). This grows the macro height from 148.8 µm to ~296 µm but gives the detail router 2× more routing space between adjacent pins. Requires regenerating the LEF/LIB and rerunning from floorplan.

Distributed macro placement (4096x16 along the bottom, 16x1024 and 1024x16 sharing the top row with a 52 um channel between them), CORE_UTILIZATION 25 -> 45, MACRO_PLACE_HALO 12 -> 20. Detail routing reaches a stable 1780 Lef58EolKeepOut M4 violations (down from 4180 with default RTLMP placement at util 25 / halo 12) but does not clear to zero -- residual violations are signals exiting the macro's M4 side-pins immediately hitting the macro's M4 power straps, same root cause that keeps cnn/bp_uno/bp_quad in the asap7 NOT-CACHED cohort. Timing closes cleanly: TNS/WNS 0, worst slack +3253 ps, fmax 572 MHz (target 200 MHz). Filed under: best DRC count any HighTide asap7 bsg-fakeram-style design has reached, but partial -- design ships as NOT-CACHED with the residual count recorded in the results page.

mguthaus · 2026-05-12T14:53:40Z

Update — `031f73a` (2026-05-12)

New commit changes the floorplan strategy. Result is the same class of failure the original analysis described, but the EOL plateau drops from ~4150 → 1780.

Changes

Added designs/asap7/ternip/macro_placement.tcl — hand-placed all 3 macros:
- fakeram7_4096x16 along the bottom (R0 @ 35, 15)
- fakeram7_16x1024 top-left (R0 @ 15, 220)
- fakeram7_1024x16 top-right (R0 @ 235, 220)
- 52 µm halo-to-halo channel between the two upper macros, 37 µm between 4096x16 top and upper-row bottom
CORE_UTILIZATION 25 → 45 (die 515 → 384 µm)
MACRO_PLACE_HALO 12 → 20

DRT trajectory

Iter	Old (RTLMP, util 25, halo 12)	New (hand-placed, util 45, halo 20)
0	10579	11710
5	4275	6096
10	4222	5024
15	4184	~4300
22	4184	1955
30	4184	1779
64	(capped @ 4150)	1780

DRT spent iters 0–9 looking worse than the original (smaller die = denser initial state) and then broke through, eventually reaching half the original count.

Root cause confirmation

99.94% of the remaining 1780 violations are still Lef58EolKeepOut on M4 between signal nets and macro VDD/VSS straps — exactly the same class the original analysis identified, just fewer of them. Sample:

srcs: net:tmatmul.importvector.read_data_q2[107]  net:VSS
bbox = (11.9970, 302.3640) - (12.0750, 302.3880) on Layer M4

The hand placement buys roominess — signals that can escape on M5+ now have room to do so — but signals that must exit at a macro M4 pin still hit the macro's own internal power straps regardless of layout.

The original analysis's open fix (doubling pin pitch via pin_track_count 3 → 6) is still on the table and would address the remaining 1780. Other complementary levers: a pre_route.tcl that adds M3/M4 blockages over each macro halo, or LEF-level changes to expose macro pins on M5.

Other QoR

Timing closes cleanly:

TNS/WNS: 0 / 0
Worst slack: +3253 ps (5 ns clock)
fmax: 572 MHz (target 200 MHz)
Skew: 148 ps

Results-page coverage

Results row + gallery card landed on the webpage branch (commit 2e4414b). The schema has no DRC column, so the row reports valid timing/area/power but the 1780 residual count isn't visible — same convention as cnn / bp_uno.

sifferman and others added 2 commits May 8, 2026 17:19

initial commit

b59e000

mguthaus marked this pull request as ready for review May 12, 2026 14:54

mguthaus merged commit c79b0a9 into main May 12, 2026

mguthaus mentioned this pull request May 13, 2026

ternip asap7: hand-placed banked SRAMs + gen_fakeram supply-rail fix → 0 DRC #141

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added Ternip#125

Added Ternip#125
mguthaus merged 2 commits into
mainfrom
ternip

sifferman commented May 9, 2026

Uh oh!

mguthaus commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sifferman commented May 9, 2026

Ternip

asap7

Configuration

FakeRAM macros (asap7)

Floorplan — macro placement

Detail routing — convergence failure

Open fix

Uh oh!

mguthaus commented May 12, 2026

Update — 031f73a (2026-05-12)

Changes

DRT trajectory

Root cause confirmation

Other QoR

Results-page coverage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Update — `031f73a` (2026-05-12)