Update: auto-resolve CallConfig.block_dim to max stream capacity by ChaoWao · Pull Request #850 · hw-native-sys/simpler

ChaoWao · 2026-05-25T01:38:43Z

Summary

CallConfig::block_dim default flips from 24 to 0 (sentinel for "auto"); DeviceRunner::run resolves 0 at launch time to the max the AICore stream can host.
Onboard runners query aclrtGetStreamResLimit(CUBE_CORE / VECTOR_CORE) and pick min(cube/AIC_per_bd, vector/AIV_per_bd), falling back to PLATFORM_MAX_BLOCKDIM if the ACL call fails. Sim runners short-circuit to PLATFORM_MAX_BLOCKDIM since there is no per-stream resource query.
scene_test.py's implicit block_dim fallback (1) is dropped in favour of 0 so cases that omit block_dim exercise the new auto path.

Behavior change

Any caller constructing CallConfig() without setting block_dim previously got 24. They now get the stream's max — 36 on a full-die a5 stream, 24 on a2a3, possibly smaller on resource-limited streams.
validate_block_dim now applies the PLATFORM_MAX_BLOCKDIM clamp on both the ACL-success path and the ACL-fallback path (previously only the fallback was clamped). This is a defensive narrowing — the runtime handshake/scheduler arrays are statically sized to RUNTIME_MAX_WORKER = PLATFORM_MAX_BLOCKDIM * PLATFORM_CORES_PER_BLOCKDIM, so accepting an ACL-reported value larger than the platform cap would over-run those static arrays. No observed-hardware impact today, but explicit values above the platform cap now error early instead of corrupting later.

Why

Most callers want "use everything the stream allows" and were hand-picking 24 arbitrarily. Centralizing the policy in DeviceRunner removes the magic number from user code and adapts automatically when stream capacity is constrained (CPU partitioning, model-shared streams, etc.).

Test plan

Local sim build (pip install --no-build-isolation -e .) passes
tests/ut/py/test_chip_worker.py::TestCallConfig updated for new default; pre-commit hooks (clang-tidy / cpplint / pyright) clean
spmd_basic ST gains a Case2_AutoBlockDim on both a5 and a2a3 that omits block_dim, so onboard CI exercises query_max_block_dim end-to-end
simpler_setup/scene_test.py fallback flipped from 1 to 0 so future cases that omit block_dim also exercise the auto path

gemini-code-assist

Code Review

This pull request introduces an "auto" resolution feature for block_dim by changing its default value from 24 to 0. When set to 0, the DeviceRunner dynamically determines the maximum allowable block_dim based on stream resource limits or platform constants. Feedback from the reviewer highlights a loss of diagnostic logging information regarding specific core limits during validation and identifies performance redundancies where resource limits are queried multiple times during the launch path.

Flip CallConfig::block_dim default from 24 to 0, and treat 0 as a sentinel that DeviceRunner resolves at run() time. Onboard runners ask aclrtGetStreamResLimit (CUBE_CORE / VECTOR_CORE) for the per-stream cap and pick min(cube/AIC_per_bd, vector/AIV_per_bd); sim runners use the static PLATFORM_MAX_BLOCKDIM. Existing explicit positive values still go through the same validation path unchanged. - Refactor onboard validate_block_dim to share query_max_block_dim with the auto-resolution path (a5 + a2a3) - Sim runners short-circuit block_dim == 0 to PLATFORM_MAX_BLOCKDIM before the range check (a5 + a2a3) - Update getting-started and chip-level-arch examples to show the auto default; test_chip_worker now asserts block_dim defaults to 0 - Document the sentinel in CallConfig header and ChipWorker.run docstring

gemini-code-assist Bot reviewed May 25, 2026

View reviewed changes

Comment thread src/a2a3/platform/onboard/host/device_runner.cpp Outdated

Comment thread src/a2a3/platform/onboard/host/device_runner.cpp Outdated

Comment thread src/a5/platform/onboard/host/device_runner.cpp Outdated

Comment thread src/a5/platform/onboard/host/device_runner.cpp Outdated

ChaoWao force-pushed the feat/auto-resolve-block-dim-from-stream-limit branch 2 times, most recently from d81ca56 to a3cf9ad Compare May 25, 2026 02:14

ChaoWao force-pushed the feat/auto-resolve-block-dim-from-stream-limit branch from a3cf9ad to 08f3fba Compare May 25, 2026 04:34

ChaoWao merged commit b72df25 into hw-native-sys:main May 25, 2026
15 checks passed

ChaoWao deleted the feat/auto-resolve-block-dim-from-stream-limit branch May 25, 2026 06:23

ChaoWao mentioned this pull request May 25, 2026

Refactor: rename sche_cpu_num to aicpu_thread_num for honest semantics #854

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update: auto-resolve CallConfig.block_dim to max stream capacity#850

Update: auto-resolve CallConfig.block_dim to max stream capacity#850
ChaoWao merged 1 commit into
hw-native-sys:mainfrom
ChaoWao:feat/auto-resolve-block-dim-from-stream-limit

ChaoWao commented May 25, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ChaoWao commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Behavior change

Why

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ChaoWao commented May 25, 2026 •

edited

Loading