Add: tiered perf_level (0-4) for --enable-l2-swimlane on a5 by indigo1973 · Pull Request #841 · hw-native-sys/simpler

indigo1973 · 2026-05-21T13:08:08Z

Add: tiered perf_level (0-4) for --enable-l2-swimlane on a5

Port the tiered L2 swimlane perf_level feature from #782 (a2a3-only)
to the a5 platform, so a5 onboard and a5sim now honor the integer
perf_level (0-4) instead of treating --enable-l2-swimlane as a plain
boolean.

Mirror the a2a3 wiring on a5:

L2PerfDataHeader::l2_perf_level carries the level into shared memory;
AICPU promotes it in l2_perf_aicpu_init and exposes it via
get_l2_perf_level().
Host-side L2PerfCollector caches the level to gate JSON sections and
stamps the JSON "version" field directly from perf_level.
Apply level gates throughout AICPU code paths: skip dispatch/finish
timestamps and fanout copies below AICPU_TIMING, scheduler phase
records below SCHED_PHASES, and orchestrator phase records below
ORCH_PHASES.
Plumb perf_level through DeviceRunner / pto_runtime_c_api on both
onboard and sim backends.
Move l2_perf_aicpu_init out of the dispatch one-time-init block into
SchedulerContext::init() in scheduler_cold_path.cpp, matching a2a3
so the orchestrator thread reads a promoted level when caching
rt->orchestrator.l2_perf_level.
Align l2_perf_aicpu_record_phase to a2a3 byte-for-byte: remove the
end-of-function wmb() and the 3 dropped-path wmbs (all introduced
by #777, none present in a2a3), and unify the accounting comment
- brace style. Measured ~1.1 ms reduction in L4 orch_cost on
  paged_attention_unroll Case1.
Align l2_perf_aicpu_complete_record with a2a3: add thread_idx
parameter (routed from both host_build_graph and tensormap_and_ringbuffer
callers), introduce an AICPU-private s_perf_records_buffers[] cache as
the records-buffer SoT, rename switch_buffer -> switch_records_buffer
and rotate after the write so the just-committed record is preserved,
and surface ring/task_id mismatch as a dedicated LOG_ERROR
(completion-before-dispatch invariant violation) separate from
capacity drops. init / flush_buffers maintain s_perf_records_buffers[]
in lockstep with state->current_buf_ptr so flush deterministically
halts subsequent commits.

Update docs (l2-swimlane-profiling.md, profiling_levels.md,
testing.md) to drop the "a5 is boolean-only" caveat and document the
unified integer interface across a2a3 and a5.

gemini-code-assist

Code Review

This pull request implements a tiered performance profiling system (L2PerfLevel 0–4) for the a5 platform, transitioning from a binary toggle to granular collection levels including AICore timing, AICPU timing, scheduler phases, and orchestrator phases. The changes involve updating the shared-memory handshake, gating timing and fanout collection in the AICPU executor and scheduler, and updating documentation. A critical feedback point identifies a memory ordering risk in the phase recording logic where a removed memory barrier should be replaced with a store barrier to ensure data consistency on weak memory model architectures.

Port the tiered L2 swimlane perf_level feature from [hw-native-sys#782](hw-native-sys#782) (a2a3-only) to the a5 platform, so a5 onboard and a5sim now honor the integer perf_level (0-4) instead of treating --enable-l2-swimlane as a plain boolean. Mirror the a2a3 wiring on a5: - L2PerfDataHeader::l2_perf_level carries the level into shared memory; AICPU promotes it in l2_perf_aicpu_init and exposes it via get_l2_perf_level(). - Host-side L2PerfCollector caches the level to gate JSON sections and stamps the JSON "version" field directly from perf_level. - Apply level gates throughout AICPU code paths: skip dispatch/finish timestamps and fanout copies below AICPU_TIMING, scheduler phase records below SCHED_PHASES, and orchestrator phase records below ORCH_PHASES. - Plumb perf_level through DeviceRunner / pto_runtime_c_api on both onboard and sim backends. - Move l2_perf_aicpu_init out of the dispatch one-time-init block into SchedulerContext::init() in scheduler_cold_path.cpp, matching a2a3 so the orchestrator thread reads a promoted level when caching rt->orchestrator.l2_perf_level. - Align l2_perf_aicpu_record_phase to a2a3 byte-for-byte: remove the end-of-function wmb() and the 3 dropped-path wmbs (all introduced by [hw-native-sys#777](hw-native-sys#777), none present in a2a3), and unify the accounting comment + brace style. Measured ~1.1 ms reduction in L4 orch_cost on paged_attention_unroll Case1. - Align l2_perf_aicpu_complete_record with a2a3: add thread_idx parameter (routed from both host_build_graph and tensormap_and_ringbuffer callers), introduce an AICPU-private s_perf_records_buffers[] cache as the records-buffer SoT, rename switch_buffer -> switch_records_buffer and rotate after the write so the just-committed record is preserved, and surface ring/task_id mismatch as a dedicated LOG_ERROR (completion-before-dispatch invariant violation) separate from capacity drops. init / flush_buffers maintain s_perf_records_buffers[] in lockstep with state->current_buf_ptr so flush deterministically halts subsequent commits. Update docs (l2-swimlane-profiling.md, profiling_levels.md, testing.md) to drop the "a5 is boolean-only" caveat and document the unified integer interface across a2a3 and a5.

gemini-code-assist Bot reviewed May 21, 2026

View reviewed changes

Comment thread src/a5/platform/src/aicpu/l2_perf_collector_aicpu.cpp

indigo1973 force-pushed the swim_0521 branch 2 times, most recently from 5440386 to 8338184 Compare May 25, 2026 06:43

ChaoZheng109 reviewed May 25, 2026

View reviewed changes

Comment thread docs/dfx/l2-swimlane-profiling.md Outdated

indigo1973 force-pushed the swim_0521 branch from 8338184 to 9e225b3 Compare May 25, 2026 07:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add: tiered perf_level (0-4) for --enable-l2-swimlane on a5#841

Add: tiered perf_level (0-4) for --enable-l2-swimlane on a5#841
indigo1973 wants to merge 1 commit into
hw-native-sys:mainfrom
indigo1973:swim_0521

indigo1973 commented May 21, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

indigo1973 commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

indigo1973 commented May 21, 2026 •

edited

Loading