Refactor: separate orch phase-stats and swim-lane gating by ChaoWao · Pull Request #857 · hw-native-sys/simpler

ChaoWao · 2026-05-26T01:58:10Z

Summary

Split the two profiling concerns in the PTO2_ORCH_PROFILING branch of
CYCLE_COUNT_LAP_RECORD so each is gated independently:

Phase-statistics accumulation (acc += _t1 - _t0) — gated only by
PTO2_ORCH_PROFILING at compile time, unconditional at runtime.
Swim-lane GM writes — gated only by orch->l2_perf_level >= L2PerfLevel::ORCH_PHASES at runtime, mirroring the
PTO2_PROFILING branch and the matching reads in
l2_perf_collector.cpp / aicpu_executor.cpp.

Why

Previously the PTO2_ORCH_PROFILING branch bundled cycle accumulation
and swim-lane writes behind a single compile-time gate, with two
consequences:

Callers could not get phase totals without paying GM-store cost on
every phase boundary.
The swim-lane write happened between the _t1 capture and the
_t0 reassignment, so its cost leaked into the next phase's
accumulator and distorted the totals the flag exists to measure.

After this change the swim-lane write is followed by a fresh
get_sys_cnt_aicpu() for _t0, so its GM-store cost is excluded from
the next phase's accumulator.

Mirrored across a2a3 and a5 orchestrator implementations.

Testing

Default sim build (compile-time gate stays off — no behavior change)
PTO2_ORCH_PROFILING=1 build on hardware: phase totals stable
across runs regardless of --enable-l2-swimlane level
--enable-l2-swimlane 4 with PTO2_ORCH_PROFILING=1: swim-lane
records present, totals not inflated by GM-store cost

The PTO2_ORCH_PROFILING branch of CYCLE_COUNT_LAP_RECORD previously bundled per-phase cycle accumulation and swim-lane GM writes behind a single compile-time gate. This had two consequences: - Callers could not get phase totals without paying GM-store cost. - The swim-lane write happened between the _t1 capture and the _t0 reassignment, so its cost leaked into the *next* phase's accumulator and distorted the totals the flag exists to measure. Split the two concerns so each is gated independently: - Phase-statistics accumulation (`acc += _t1 - _t0`) is gated only by PTO2_ORCH_PROFILING at compile time, unconditional at runtime. - Swim-lane recording is gated only by `orch->l2_perf_level >= L2PerfLevel::ORCH_PHASES` at runtime, mirroring the PTO2_PROFILING branch and the matching reads in l2_perf_collector / aicpu_executor. - When the swim-lane write fires, _t0 is re-sampled with a fresh get_sys_cnt_aicpu() *after* the write so the GM-store cost is excluded from the next phase's accumulator. Mirrored across a2a3 and a5 orchestrator implementations.

gemini-code-assist

Code Review

This pull request updates the profiling macros in pto_orchestrator.cpp across both the a2a3 and a5 runtimes to conditionally gate swim-lane recording at runtime based on l2_perf_level and re-sample the cycle counter after recording to exclude write overhead. The review feedback points out a fragile implicit dependency in the CYCLE_COUNT_START macro, which relies on a local variable named orch being present in the scope. It is recommended to use this->l2_perf_level instead to make the macro self-contained and prevent potential compilation errors.

gemini-code-assist Bot reviewed May 26, 2026

View reviewed changes

Comment thread src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_orchestrator.cpp

Comment thread src/a5/runtime/tensormap_and_ringbuffer/runtime/pto_orchestrator.cpp

ChaoWao merged commit 7035d2a into hw-native-sys:main May 26, 2026
15 checks passed

ChaoWao deleted the refactor/separate-orch-phase-stats-and-swim-lane-g branch May 26, 2026 02:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor: separate orch phase-stats and swim-lane gating#857

Refactor: separate orch phase-stats and swim-lane gating#857
ChaoWao merged 1 commit into
hw-native-sys:mainfrom
ChaoWao:refactor/separate-orch-phase-stats-and-swim-lane-g

ChaoWao commented May 26, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ChaoWao commented May 26, 2026

Summary

Why

Testing

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant