Skip to content

Refactor: rename sche_cpu_num to aicpu_thread_num for honest semantics#854

Merged
poursoul merged 1 commit into
hw-native-sys:mainfrom
ChaoWao:investigate/aicpu-sched-thread-naming
May 26, 2026
Merged

Refactor: rename sche_cpu_num to aicpu_thread_num for honest semantics#854
poursoul merged 1 commit into
hw-native-sys:mainfrom
ChaoWao:investigate/aicpu-sched-thread-naming

Conversation

@ChaoWao
Copy link
Copy Markdown
Collaborator

@ChaoWao ChaoWao commented May 25, 2026

Summary

Pure rename, no behavior change. `Runtime::sche_cpu_num` was a legacy misnomer — the name implied "scheduler thread count" but the value was always the total AICPU thread count (orch + schedulers). The orch/sched split lived in `AicpuExecutor::init` as `sched_thread_num_ = thread_num_ - 1`, with the `-1` carving out the orchestrator thread. The runtime field name lied; the local `-1` did the bookkeeping the rename never did.

Renames

  • `Runtime::sche_cpu_num` → `Runtime::aicpu_thread_num` (matches the user-facing `CallConfig::aicpu_thread_num` field — same name end-to-end)
  • `AicpuExecutor::thread_num_` → `AicpuExecutor::aicpu_thread_num_`
  • `SchedulerContext::thread_num_` → `SchedulerContext::aicpu_thread_num_` (and the `init()` parameter)
  • `LOG_ERROR("Invalid thread_num:")` → `("Invalid aicpu_thread_num:")`
  • Field doc-comment expanded to describe the orch/sched split (trb) or the no-split round-robin (hbg)

`sched_thread_num_` (the member holding `aicpu_thread_num_ - 1`) keeps its name — it accurately means "scheduler subset," and renaming it would only confuse the contrast with the new total field.

Drive-by

Two `platform_aicpu_affinity.h` files were missing the standard copyright header — surfaced by `check-headers` because the comment inside them was touched. Added the standard header to both (pre-existing omission, but no-skip rule applies).

Why

After PR #850, the user-facing field is `CallConfig::aicpu_thread_num` (clear: total) but it landed in `Runtime::sche_cpu_num` (confusing: implies scheduler-only). Reading `aicpu_executor.cpp:182-183` you have to figure out from context that `sche_cpu_num` means "total" — only the `-1` on the next line tells you. After this rename the same lines read:
```cpp
aicpu_thread_num_ = runtime->aicpu_thread_num;
sched_thread_num_ = aicpu_thread_num_ - 1; // -1 because the last thread is the orchestrator
```
Self-documenting.

Blast radius

28 files, 164/+122/− across `a5` + `a2a3` × `tensormap_and_ringbuffer` + `host_build_graph`. ChipWorker dlsym ABI is unchanged (the field rename is internal to the host/runtime contract; no exported symbol changed).

Test plan

  • Local sim build (`pip install --no-build-isolation -e .`) clean
  • `tests/ut/py/test_chip_worker.py` passes (14 tests)
  • `tests/st/a5/.../spmd_basic` passes on sim (covers both Case1 with explicit block_dim and Case2_AutoBlockDim via the new path)
  • All pre-commit hooks (clang-format, clang-tidy, cpplint, pyright, etc.) clean
  • Onboard CI (st-onboard-a5 + st-onboard-a2a3) — needs CI run

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request renames the variable sche_cpu_num to aicpu_thread_num across the a2a3 and a5 platforms to clarify that it represents the total number of AICPU threads launched. The changes span multiple components, including the runtime, device runners, and executors. Additionally, copyright headers were added to the affinity header files. The review feedback identifies a logical issue in the initialization sequence within aicpu_executor.cpp for both platforms: sched_thread_num_ is calculated before a defensive fixup for a zero thread count, potentially resulting in an incorrect value of -1 when the total thread count is adjusted to 1.

Comment thread src/a2a3/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cpp Outdated
Comment thread src/a5/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cpp Outdated
Runtime::sche_cpu_num was a legacy misnomer — its name implied
"scheduler thread count" but its value was always the *total* AICPU
thread count (orch + schedulers). AicpuExecutor::init had to spell out
the split with `sched_thread_num_ = thread_num_ - 1`, where the -1
carved out the orchestrator thread. The runtime field name lied; the
local variable's -1 did the bookkeeping the rename never did.

Renames (no behavior change):

- Runtime::sche_cpu_num → Runtime::aicpu_thread_num (matches the
  user-facing CallConfig::aicpu_thread_num field)
- AicpuExecutor::thread_num_ → AicpuExecutor::aicpu_thread_num_
- SchedulerContext::thread_num_ → SchedulerContext::aicpu_thread_num_
  (and the init() parameter)
- LOG_ERROR("Invalid thread_num:") → ("Invalid aicpu_thread_num:")
- Comment on Runtime::aicpu_thread_num expanded to describe the
  orch/sched split (trb) or the no-split round-robin (hbg)

sched_thread_num_ (the AicpuExecutor / SchedulerContext member that
holds aicpu_thread_num_ - 1) keeps its name — it accurately means
"scheduler subset," and renaming it would only confuse the contrast
with the new total field.

Drive-bys surfaced by reviewers / hooks:

- Reorder the aicpu_thread_num_==0 fixup in AicpuExecutor::init to run
  *before* sched_thread_num_ is derived. The pre-existing order left
  sched_thread_num_ at -1 in the (currently unreachable) zero-input
  edge case while aicpu_thread_num_ itself was corrected to 1.
- Add the standard copyright header to two platform_aicpu_affinity.h
  files (pre-existing omission flagged by check-headers).

Mirrored across a5 + a2a3 × tensormap_and_ringbuffer + host_build_graph.
@ChaoWao ChaoWao force-pushed the investigate/aicpu-sched-thread-naming branch from 215312b to f96a12a Compare May 25, 2026 09:27
@poursoul poursoul merged commit 993a3e4 into hw-native-sys:main May 26, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants