Skip to content

support vpto backend#699

Merged
zhangstevenunity merged 266 commits into
hw-native-sys:mainfrom
mouliangyu:feature-vpto-backend-merge
May 29, 2026
Merged

support vpto backend#699
zhangstevenunity merged 266 commits into
hw-native-sys:mainfrom
mouliangyu:feature-vpto-backend-merge

Conversation

@mouliangyu
Copy link
Copy Markdown
Contributor

@mouliangyu mouliangyu commented May 25, 2026

Summary

This PR adds the VPTO backend to PTOAS. The new backend is selected with --pto-backend=vpto; the default backend remains emitc.

Main additions:

  • VPTO IR/types/ops and VPTO-specific verification/lowering support.
  • VPTO backend pipeline in tools/ptoas, including VPTO module normalization, TileLang template expansion, VPTO pointer normalization, wrapper-op expansion, LLVM lowering, host stub emission, and fatobj emission.
  • TileLang DSL package, TileOp template libraries, and related lit/ST coverage.
  • CMake/CTest integration for TileLang DSL tests and install support for ptoas.

Impact on non-VPTO code paths

The default ptoas path still uses the emitc backend. VPTO split/normalize, TileLang expand, VPTO pointer cleanup, VPTO LLVM emission, and fatobj emission are only entered through the VPTO backend branch or explicitly registered passes.

This PR does change shared PTO IR and shared passes used by non-VPTO flows:

  • !pto.ptr now carries a PTO memory space. The parser/printer accepts forms such as !pto.ptr<T, gm> and !pto.ptr<T, ub>, with GM as the default. getPTOAddressSpaceAttr(Type) now reads the memory space from PtrType, and PTOViewToMemref preserves that memory space when lowering ptr types to memrefs.
  • Shared parser/printer behavior changes for several PTO ops. set_flag, wait_flag, get_buf, and rls_buf now use custom parsers/printers. get_buf/rls_buf also accept pipe attrs in addition to the previous sync op type attrs. castptr, get_tensor_view_stride, and mem_bar are added as shared IR surface.
  • Several tile ops gain a default precision_mode attr: tcolexpanddiv, tdiv, tdivs, texp, tlog, trecip, trowexpanddiv, trsqrt, and tsqrt. The verifier requires HIGH_PRECISION to be used only with f16/f32 element types.
  • Shared verifiers are tightened for existing ops: castptr validates supported cast forms and memory-space consistency; A5 tload/tstore validate vec layouts; shift-like tile ops require src0/src1/dst element types to match; scalar tile ops such as tadds/tmuls/tsubs require valid rows to match on A5; tpart* ops validate source valid shapes against destination valid shapes; tsels validates row-major layout for mask/src/dst on A5.
  • Shared lowering changes include PTOViewToMemref preserving ptr memory spaces, inserting pto.bind_tile anchors for original tile_buf function arguments, lowering get_tensor_view_stride, and preserving attrs when rebuilding selected tile ops. PTOPlanMemory handles null alias inputs defensively. PTOToEmitC maps the extended ReluPreMode values ScalarRelu, VectorRelu, and Pwl.

Validation

Local validation run on this branch:

  • ninja -C build-ci-local ptoas ptobc install
  • ninja -C build-ci-local check-pto (416/416)
  • sample CI (OK=263 FAIL=0 SKIP=16)
  • docker/test_ptoas_cli.sh
  • TileLang DSL unit tests (312 tests)
  • VPTO SIM validation (281/281)
  • TileLang ST simulator smoke (85/85)

docker/collect_ptoas_dist.sh was checked locally and failed because the script only collects LLVM libraries from the CI LLVM workspace layout; the local LLVM installation is under the actions-runner tool cache.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly expands the project's documentation by adding detailed ISA specifications, design documents for core operations like mad and acc_store, and a framework for TileLang ST validation. It also integrates the PTO-Gym submodule, updates build and Docker configurations for TileLang DSL support, and introduces a script for automated release document generation. The review feedback identifies several areas for improvement in the generation script, including more robust version string parsing, avoiding incorrect heading demotion within code blocks, replacing fragile string matching with structured markers, and ensuring anchor uniqueness to prevent broken links in merged documentation.

notes[version] = "Release refresh"

def key_fn(item: str) -> tuple[int, ...]:
return tuple(int(part) for part in item.split("."))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This list comprehension will raise a ValueError if any part of the version string is not a digit (e.g., "0.4-beta"). Since this script is used for release documentation generation, consider adding a check to ensure only numeric parts are converted to integers to avoid runtime crashes.

new_level = min(6, len(hashes) + levels)
return f"{'#' * new_level} {heading}"

return re.sub(r"^(#{1,6})\s+(.*)$", replace, text, flags=re.MULTILINE)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The regex r"^(#{1,6})\s+(.*)$" used with re.MULTILINE matches any line starting with a hash followed by a space. This can incorrectly demote comments inside fenced code blocks (e.g., in Python or Bash snippets) if they are at the start of a line. It is recommended to use a more robust way to identify headings that are not part of code blocks, such as splitting the text by code block markers before applying the substitution.

Comment on lines +175 to +178
if "For detailed semantics, C-style pseudocode, and CCE mappings" in line:
continue
if "CCE correspondence" in line or "builtin mapping" in line.lower():
continue
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This logic for stripping unwanted lines relies on exact string matches for prose, which is fragile. If the source document (vpto-spec.md) is updated with slightly different wording, these lines will no longer be stripped. Consider using more stable markers or a structured approach to identify content to be removed.

Comment on lines +212 to +213
if anchor_suffix:
return f"]({anchor_suffix})"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

When an intra-bundle link contains an anchor (e.g., [Foo](chapter.md#anchor)), it is rewritten to ](#anchor). This assumes that the anchor name is unique across all inlined chapters in the merged document. If multiple chapters use the same anchor names, these links may break or point to the wrong location. Consider prefixing internal anchors with the chapter name during the merge process to ensure uniqueness.

@reedhecre
Copy link
Copy Markdown

reedhecre commented May 25, 2026

Codex Review

该评论由 review 机器人自动更新。

  • PR: support vpto backend #699 support vpto backend
  • Author: mouliangyu
  • Base/Head: main / feature-vpto-backend-merge
  • Head SHA: 21ad259ab6aa
  • Trigger: PR 有新提交
  • Generated At: 2026-05-29T05:02:22Z
  • Previous Head SHA: db2bf1f7ba2d
  • Status: completed

Summary

检查到 PR #699 存在 4 个高优先级问题:.ptobc 的目标架构会被错误覆盖、release/nightly 发布流程被硬禁用、多个新 VPTO CLI 选项实际无效,以及 --pto-seam-ir-file=- 会污染默认 stdout 输出。

Findings

  1. P1 `.ptobc` 输入里编码的 `pto.target_arch` 会在主流水线前被错误改回默认值 tools/ptoas/ptoas.cpp:1826

这里先在 1647-1652 行保留了解码后模块自带的 pto.target_arch,但 1826 行又无条件把模块属性重写成局部变量 arch。对 .ptobc 输入来说,arch 并不会从字节码内容里重新推导,未显式传 --pto-arch 时它仍是默认的 a3。结果是任何编码为 A5 的 .ptobc 模块都会在后续 verifier / lowering / codegen 中按 A3 处理,属于现有字节码兼容性和正确性回归。

  1. P1 release/nightly 产物发布流程被 `if: false` 直接禁用了 .github/workflows/build_wheel.yml:232

Linux workflow 这里把 upload_release_assets 整个 job 关掉了,而且同一个文件前面的 wheel 生成/上传步骤也都被 if: false 禁用;macOS workflow 也做了同样的改动。这样在 releaseschedule 事件上,workflow 仍会触发,但不会发布 wheel,也不会把二进制包上传到 GitHub Release,后续依赖这个 job 的版本 bump 也会一起跳过。相对 origin/main,这相当于把正式发布链路变成了空操作。

  1. P2 多个对外暴露的 VPTO CLI 选项实际上没有接入任何行为 tools/ptoas/ptoas.cpp:387

--vpto-print-ir--dump-vpto-ir--vpto-lowering-strategy 都出现在 CLI 帮助里,但实现里没有真正消费它们。buildVPTOEmissionOptions() 还在 1447 行把 dumpVPTOIR 硬编码成 falserunVPTOBackendPipeline() 也完全没有读取 lowering strategy,因此用户即使显式请求 IR dump 或 no-post-update 策略,最终仍然只能得到默认行为。这是明确的 CLI contract mismatch。

  1. P2 `--pto-seam-ir-file=-` 会把 seam IR 和最终输出混写到同一个 stdout tools/ptoas/ptoas.cpp:480

emitSharedPreBackendSeamIR() 在参数为 - 时直接向 llvm::outs() 打印 seam IR;而主输出 -o 的默认值本来也是 -。因此用户只要执行 ptoas --pto-backend=vpto --pto-seam-ir-file=- ...,stdout 上就会先出现文本 seam IR,再接最终的 VPTO IR 或 fatobj,输出被污染成不可用内容,除非调用者额外把 -o 手工重定向到别的文件。

@mouliangyu mouliangyu force-pushed the feature-vpto-backend-merge branch from 2a4ccae to db2bf1f Compare May 26, 2026 14:35
@FangRui0
Copy link
Copy Markdown
Contributor

/run a3

@reedhecre
Copy link
Copy Markdown

已接收 /run a3,A3 板测器会处理这条请求。

页面会自动刷新,可以直接看当前阶段、排队情况和最近结果。

@reedhecre
Copy link
Copy Markdown

A3 板测失败

  • 触发方式:manual
  • 源码提交:03089d80ad9c
  • 结果汇总:OK 217 / FAIL 2 / SKIP 1
  • 日志:/home/zhongxuan/ptoas-board-monitor/runtime/logs/20260527_104105_manual_pr699.log
  • 手动指令:/run a3
  • 触发人:FangRui0
  • 触发评论:support vpto backend #699 (comment)
  • 失败阶段:board-validation / exit=1

失败用例

  • syncall_binding (run, exit=1)
  • tprefetch_async_binding (run, exit=1)

@reedhecre
Copy link
Copy Markdown

A3 板测失败详情:PR #699

syncall_binding

stage=run info=exit=1

[ERROR] aclrtSynchronizeStream(stream) failed: 507014 (/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260527_104105_manual_pr699/npu_validation/SyncAll/syncall_binding/main.cpp:84)
[ERROR] RecentErrMsg: EZ9999: Inner Error!
EZ9999[PID: 4175868] 2026-05-27-11:26:05.538.466 (EZ9999):  The error from device(chipId:2, dieId:0), serial number is 146, there is an exception of aicore error, core id is 17, error code = 0, dump info: pc start: 0x124800000000, current: 0x124800000188, vec error info: 0, mte error info: 0x7a06000033, ifu error info: 0x212c200024600, ccu error info: 0x40a01900778000d8, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd00028c, para base: 0x12c100000000.[FUNC:PrintCoreInfo][FILE:device_error_core_proc.cc][LINE:645]
        TraceBack (most recent call last):
       The extend info: errcode:(0, 0, 0) errorStr: timeout or trap error. fixp_error0 info: 0x6000033, fixp_error1 info: 0x7a, fsmId:1, tslot:0, thread:0, ctxid:0, blk:0, sublk:0, subErrType:4.[FUNC:PrintCoreInfo][FILE:device_error_core_proc.cc][LINE:658]
       Kernel task happen error, retCode=0x25, [aicore timeout].[FUNC:PreCheckTaskErr][FILE:davinci_kernel_task.cc][LINE:1729]
       AICORE Kernel task happen error, retCode=0x25.[FUNC:GetError][FILE:stream.cc][LINE:1475]
       [AIC_INFO] after execute:args print end[FUNC:GetError][FILE:stream.cc][LINE:1475]
       [DFX_INFO]Aicore kernel execute failed, device_id=4, stream_id=46, report_stream_id=46, task_id=0, flip_num=0, fault kernel_name=_Z22syncall_binding_kernelPii, fault kernel info ext=_Z22syncall_binding_kernelPii, program id=0, hash=3129332313788381512.[FUNC:GetError][FILE:stream.cc][LINE:1475]
       rtStreamSynchronize execution failed, reason=aicore timeout[FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:65]
       synchronize stream failed, runtime result = 507014[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:148]
[2026-05-27 11:26:06] ERROR: testcase failed (exit 1): syncall_binding
tprefetch_async_binding

stage=run info=exit=1

[ERROR] aclrtSynchronizeStream(stream) failed: 507035 (/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260527_104105_manual_pr699/npu_validation/TPrefetchAsync/tprefetch_async_binding/main.cpp:91)
[ERROR] RecentErrMsg: EZ9999: Inner Error!
EZ9999[PID: 348335] 2026-05-27-11:26:46.014.212 (EZ9999):  The error from device(chipId:2, dieId:0), serial number is 147, there is an exception of aivec error, core id is 34, error code = 0, dump info: pc start: 0x124800000000, current: 0x124800000160, vec error info: 0x1e000000a8, mte error info: 0xa50312b08b, ifu error info: 0x2dfffc0e34f40, ccu error info: 0x40a00e0000000052, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd00028c, para base: 0x12c100000000.[FUNC:PrintCoreInfo][FILE:device_error_core_proc.cc][LINE:645]
        TraceBack (most recent call last):
       The extend info: errcode:(0, 0x200000000000000, 0) errorStr: The MPU address access is invalid. fixp_error0 info: 0x312b08b, fixp_error1 info: 0xa5, fsmId:1, tslot:0, thread:0, ctxid:0, blk:0, sublk:0, subErrType:4.[FUNC:PrintCoreInfo][FILE:device_error_core_proc.cc][LINE:658]
       Kernel task happen error, retCode=0x31, [vector core exception].[FUNC:PreCheckTaskErr][FILE:davinci_kernel_task.cc][LINE:1729]
       AIV Kernel happen error, retCode=0x31.[FUNC:GetError][FILE:stream.cc][LINE:1475]
       [AIC_INFO] after execute:args print end[FUNC:GetError][FILE:stream.cc][LINE:1475]
       [DFX_INFO]Aicore kernel execute failed, device_id=4, stream_id=46, report_stream_id=46, task_id=0, flip_num=0, fault kernel_name=_Z30tprefetch_async_binding_kernelPfPa, fault kernel info ext=_Z30tprefetch_async_binding_kernelPfPa, program id=0, hash=8435686547367685641.[FUNC:GetError][FILE:stream.cc][LINE:1475]
       rtStreamSynchronize execution failed, reason=vector core exception[FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:65]
       synchronize stream failed, runtime result = 507035[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:148]
[2026-05-27 11:26:48] ERROR: testcase failed (exit 1): tprefetch_async_binding

@FangRui0
Copy link
Copy Markdown
Contributor

/run a5

@reedhecre
Copy link
Copy Markdown

已接收 /run a5,A5 板测器会处理这条请求。

页面会自动刷新,可以直接看当前阶段、排队情况和最近结果。

@reedhecre
Copy link
Copy Markdown

A5 板测失败

  • 触发方式:manual
  • 源码提交:1efacfbfe595
  • 结果汇总:OK 213 / FAIL 6 / SKIP 1
  • 日志:/root/ptoas-board-monitor-a5/logs/20260528_150006_manual_pr699.log
  • 手动指令:/run a5
  • 触发人:FangRui0
  • 触发评论:support vpto backend #699 (comment)
  • 失败阶段:board-validation / exit=1

失败用例

  • tprefetch_async_binding (run, exit=1)
  • syncall_binding (run, exit=1)
  • test_intercore_sync_a5_dyn (run, exit=139)
  • matMul (run, exit=139)
  • cmps (run, exit=2)
  • cmp (run, exit=2)

@reedhecre
Copy link
Copy Markdown

A5 板测失败详情:PR #699

tprefetch_async_binding

stage=run info=exit=1

[ERROR] aclrtSynchronizeStream(stream) failed: 507035 (/tmp/ptoas-board-monitor-a5/runs/20260528_150006_manual_pr699/npu_validation/TPrefetchAsync/tprefetch_async_binding/main.cpp:91)
[ERROR] RecentErrMsg: EZ9999: Inner Error!
EZ9999[PID: 379865] 2026-05-28-15:08:16.690.774 (EZ9999):  The error from device(chipId:0, dieId:0), serial number is 11, there is an aivec error exception, core id is 0, error code = 271, dump info: pc start: 0x100040800000, current: 0x1000408001a0, sc error info: 0xffffffffffff, su error info: 0xe6ffd23d139c0059,0xcc3fd0e010009bf5, mte error info: 0x1fd3f5c60007eff1, vec error info: 0xe7d1ff9e0013db24, cube error info: 0, l1 error info: 0, aic error mask: 0x395856, para base: 0x100040200000, mte error: 0.[FUNC:ProcessDavidStarsCoreErrorInfo][FILE:device_error_proc_c.cc][LINE:580]
        TraceBack (most recent call last):
       The extend info: errcode:(271) errorStr: The MPU address access is invalid. subErrType: 0x4.[FUNC:ProcessDavidStarsCoreErrorInfo][FILE:device_error_proc_c.cc][LINE:583]
       Kernel task happen error, retCode=0x31, [vector core exception].[FUNC:PreCheckTaskErr][FILE:davinci_kernel_task.cc][LINE:1728]
       AIV Kernel happen error, retCode=0x31.[FUNC:GetError][FILE:stream.cc][LINE:1478]
       [AIC_INFO] after execute:args print end[FUNC:GetError][FILE:stream.cc][LINE:1478]
       [DFX_INFO]Aicore kernel execute failed, device_id=1, stream_id=62, report_stream_id=62, task_id=0, flip_num=0, fault kernel_name=_Z30tprefetch_async_binding_kernelPfPa, fault kernel info ext=_Z30tprefetch_async_binding_kernelPfPa, program id=0, hash=1899772384034012286.[FUNC:GetError][FILE:stream.cc][LINE:1478]
       rtStreamSynchronize execution failed, reason=vector core exception[FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:65]
       synchronize stream failed, runtime result = 507035[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:148]
[2026-05-28 15:08:22] ERROR: testcase failed (exit 1): tprefetch_async_binding
syncall_binding

stage=run info=exit=1

[ERROR] aclrtSynchronizeStream(stream) failed: 507015 (/tmp/ptoas-board-monitor-a5/runs/20260528_150006_manual_pr699/npu_validation/SyncAll/syncall_binding/main.cpp:84)
[ERROR] RecentErrMsg: EZ9999: Inner Error!
EZ9999[PID: 384577] 2026-05-28-15:09:44.708.122 (EZ9999):  The error from device(chipId:0, dieId:0), serial number is 12, there is an aicore error exception, core id is 0, error code = 259, dump info: pc start: 0x100040800000, current: 0x100040800110, sc error info: 0xffffffffffff, su error info: 0xfeffffee1efe0030,0x72042208f8007fff, mte error info: 0xd91ddef00007beb6, vec error info: 0, cube error info: 0, l1 error info: 0xfff600167e8e, aic error mask: 0x395856, para base: 0x100040200000, mte error: 0.[FUNC:ProcessDavidStarsCoreErrorInfo][FILE:device_error_proc_c.cc][LINE:580]
        TraceBack (most recent call last):
       The extend info: errcode:(259) errorStr: Illegal instruction, which is usually caused by unaligned UUB addresses. subErrType: 0x4.[FUNC:ProcessDavidStarsCoreErrorInfo][FILE:device_error_proc_c.cc][LINE:583]
       Kernel task happen error, retCode=0x26, [aicore exception].[FUNC:PreCheckTaskErr][FILE:davinci_kernel_task.cc][LINE:1728]
       AICORE Kernel task happen error, retCode=0x26.[FUNC:GetError][FILE:stream.cc][LINE:1478]
       [AIC_INFO] after execute:args print end[FUNC:GetError][FILE:stream.cc][LINE:1478]
       [DFX_INFO]Aicore kernel execute failed, device_id=1, stream_id=62, report_stream_id=62, task_id=0, flip_num=0, fault kernel_name=_Z22syncall_binding_kernelPii, fault kernel info ext=_Z22syncall_binding_kernelPii, program id=0, hash=9475521060208115623.[FUNC:GetError][FILE:stream.cc][LINE:1478]
       rtStreamSynchronize execution failed, reason=aicore exception[FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:65]
       synchronize stream failed, runtime result = 507015[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:148]
[2026-05-28 15:09:50] ERROR: testcase failed (exit 1): syncall_binding
test_intercore_sync_a5_dyn

stage=run info=exit=139

./test/npu_validation/scripts/run_remote_npu_validation.sh: line 366: 393614 Segmentation fault      (core dumped) LD_LIBRARY_PATH="${LD_LIBRARY_PATH_NPU}" ./build/${testcase}
[2026-05-28 15:12:37] ERROR: testcase failed (exit 139): test_intercore_sync_a5_dyn
matMul

stage=run info=exit=139

./test/npu_validation/scripts/run_remote_npu_validation.sh: line 366: 515149 Segmentation fault      (core dumped) LD_LIBRARY_PATH="${LD_LIBRARY_PATH_NPU}" ./build/${testcase}
[2026-05-28 15:45:54] ERROR: testcase failed (exit 139): matMul
cmps

stage=run info=exit=2

[ERROR] Packed mask mismatch: golden_v2.bin vs v2.bin, idx=4 (golden=98, out=0)
[ERROR] compare failed
[2026-05-28 16:00:31] ERROR: testcase failed (exit 2): cmps
cmp

stage=run info=exit=2

[ERROR] Packed mask mismatch: golden_v3.bin vs v3.bin, idx=4 (golden=49, out=0)
[ERROR] compare failed
[2026-05-28 16:00:41] ERROR: testcase failed (exit 2): cmp

@HecreReed
Copy link
Copy Markdown
Collaborator

/run a5

@reedhecre
Copy link
Copy Markdown

已接收 /run a5,A5 板测器会处理这条请求。

页面会自动刷新,可以直接看当前阶段、排队情况和最近结果。

@reedhecre
Copy link
Copy Markdown

A5 板测失败

  • 触发方式:manual
  • 源码提交:1efacfbfe595
  • 结果汇总:OK 213 / FAIL 6 / SKIP 1
  • 日志:/root/ptoas-board-monitor-a5/logs/20260528_191407_manual_pr699.log
  • 手动指令:/run a5
  • 触发人:HecreReed
  • 触发评论:support vpto backend #699 (comment)
  • 失败阶段:board-validation / exit=1

失败用例

  • tprefetch_async_binding (run, exit=1)
  • syncall_binding (run, exit=1)
  • rope_kv_cache (run, exit=139)
  • plan_memory_loop_in_if (run, exit=139)
  • cmps (run, exit=2)
  • cmp (run, exit=2)

@reedhecre
Copy link
Copy Markdown

A5 板测失败详情:PR #699

tprefetch_async_binding

stage=run info=exit=1

[ERROR] aclrtSynchronizeStream(stream) failed: 507035 (/tmp/ptoas-board-monitor-a5/runs/20260528_191407_manual_pr699/npu_validation/TPrefetchAsync/tprefetch_async_binding/main.cpp:91)
[ERROR] RecentErrMsg: EZ9999: Inner Error!
EZ9999[PID: 1127240] 2026-05-28-19:22:18.712.602 (EZ9999):  The error from device(chipId:0, dieId:0), serial number is 19, there is an aivec error exception, core id is 0, error code = 271, dump info: pc start: 0x100040800000, current: 0x1000408001a0, sc error info: 0xffffffffffff, su error info: 0xe6ffd23d139c0059,0xcc3fd0e010009bf5, mte error info: 0x1fd3f5c60007eff1, vec error info: 0xe7d1ff9e0013db24, cube error info: 0, l1 error info: 0, aic error mask: 0x395856, para base: 0x100040200000, mte error: 0.[FUNC:ProcessDavidStarsCoreErrorInfo][FILE:device_error_proc_c.cc][LINE:580]
        TraceBack (most recent call last):
       The extend info: errcode:(271) errorStr: The MPU address access is invalid. subErrType: 0x4.[FUNC:ProcessDavidStarsCoreErrorInfo][FILE:device_error_proc_c.cc][LINE:583]
       Kernel task happen error, retCode=0x31, [vector core exception].[FUNC:PreCheckTaskErr][FILE:davinci_kernel_task.cc][LINE:1728]
       AIV Kernel happen error, retCode=0x31.[FUNC:GetError][FILE:stream.cc][LINE:1478]
       [AIC_INFO] after execute:args print end[FUNC:GetError][FILE:stream.cc][LINE:1478]
       [DFX_INFO]Aicore kernel execute failed, device_id=1, stream_id=62, report_stream_id=62, task_id=0, flip_num=0, fault kernel_name=_Z30tprefetch_async_binding_kernelPfPa, fault kernel info ext=_Z30tprefetch_async_binding_kernelPfPa, program id=0, hash=1899772384034012286.[FUNC:GetError][FILE:stream.cc][LINE:1478]
       rtStreamSynchronize execution failed, reason=vector core exception[FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:65]
       synchronize stream failed, runtime result = 507035[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:148]
[2026-05-28 19:22:23] ERROR: testcase failed (exit 1): tprefetch_async_binding
syncall_binding

stage=run info=exit=1

[ERROR] aclrtSynchronizeStream(stream) failed: 507015 (/tmp/ptoas-board-monitor-a5/runs/20260528_191407_manual_pr699/npu_validation/SyncAll/syncall_binding/main.cpp:84)
[ERROR] RecentErrMsg: EZ9999: Inner Error!
EZ9999[PID: 1134154] 2026-05-28-19:23:45.378.505 (EZ9999):  The error from device(chipId:0, dieId:0), serial number is 20, there is an aicore error exception, core id is 0, error code = 259, dump info: pc start: 0x100040800000, current: 0x100040800110, sc error info: 0xffffffffffff, su error info: 0xfeffffee1efe0030,0x72042208f8007fff, mte error info: 0xd91ddef00007beb6, vec error info: 0, cube error info: 0, l1 error info: 0xfff600167e8e, aic error mask: 0x395856, para base: 0x100040200000, mte error: 0.[FUNC:ProcessDavidStarsCoreErrorInfo][FILE:device_error_proc_c.cc][LINE:580]
        TraceBack (most recent call last):
       The extend info: errcode:(259) errorStr: Illegal instruction, which is usually caused by unaligned UUB addresses. subErrType: 0x4.[FUNC:ProcessDavidStarsCoreErrorInfo][FILE:device_error_proc_c.cc][LINE:583]
       Kernel task happen error, retCode=0x26, [aicore exception].[FUNC:PreCheckTaskErr][FILE:davinci_kernel_task.cc][LINE:1728]
       AICORE Kernel task happen error, retCode=0x26.[FUNC:GetError][FILE:stream.cc][LINE:1478]
       [AIC_INFO] after execute:args print end[FUNC:GetError][FILE:stream.cc][LINE:1478]
       [DFX_INFO]Aicore kernel execute failed, device_id=1, stream_id=62, report_stream_id=62, task_id=0, flip_num=0, fault kernel_name=_Z22syncall_binding_kernelPii, fault kernel info ext=_Z22syncall_binding_kernelPii, program id=0, hash=9475521060208115623.[FUNC:GetError][FILE:stream.cc][LINE:1478]
       rtStreamSynchronize execution failed, reason=aicore exception[FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:65]
       synchronize stream failed, runtime result = 507015[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:148]
[2026-05-28 19:23:50] ERROR: testcase failed (exit 1): syncall_binding
rope_kv_cache

stage=run info=exit=139

./test/npu_validation/scripts/run_remote_npu_validation.sh: line 366: 1214070 Segmentation fault      (core dumped) LD_LIBRARY_PATH="${LD_LIBRARY_PATH_NPU}" ./build/${testcase}
[2026-05-28 19:42:27] ERROR: testcase failed (exit 139): rope_kv_cache
plan_memory_loop_in_if

stage=run info=exit=139

./test/npu_validation/scripts/run_remote_npu_validation.sh: line 366: 1248711 Segmentation fault      (core dumped) LD_LIBRARY_PATH="${LD_LIBRARY_PATH_NPU}" ./build/${testcase}
[2026-05-28 19:52:02] ERROR: testcase failed (exit 139): plan_memory_loop_in_if
cmps

stage=run info=exit=2

[ERROR] Packed mask mismatch: golden_v2.bin vs v2.bin, idx=4 (golden=98, out=0)
[ERROR] compare failed
[2026-05-28 20:14:50] ERROR: testcase failed (exit 2): cmps
cmp

stage=run info=exit=2

[ERROR] Packed mask mismatch: golden_v3.bin vs v3.bin, idx=4 (golden=49, out=0)
[ERROR] compare failed
[2026-05-28 20:15:01] ERROR: testcase failed (exit 2): cmp

Zhendong404 and others added 15 commits May 29, 2026 09:25
    Introduce a three-pass pipeline that lowers PTO tile ops to vector-level
    implementations via TileLang DSL templates:

    - ExpandTileOp: invokes TileLang Python DSL to instantiate template
      functions and replaces tile ops with func.call. SpecKey covers all
      operands; tile_buf operands are passed through without bridging.
    - PTOInlineLibCall: extended to recognize tilelang instance functions via
      the  attribute set by the DSL frontend.
    - FoldTileBufIntrinsics: resolves pto.tile_buf_addr / tile_valid_rows /
      tile_valid_cols, including dynamic valid-shape via pto.bind_tile chain
      tracing.
    - MemrefToTileBuf: recovers tile_buf types from memref + bind_tile
      metadata after PlanMemory/InsertSync.
    - PTOViewToMemref: insert pto.bind_tile anchors for tile_buf function
      args so MemrefToTileBuf can recover them.

    Adds new PTO ops (tile_buf_addr/tile_valid_rows/tile_valid_cols),
    ptoas pipeline wiring, design docs, and unit tests.
mouliangyu and others added 25 commits May 29, 2026 12:06
Move lit-discovered .pto tests out of test/basic into test/lit/pto or test/lit/vpto, and update related documentation paths.

Relocate misplaced VPTO sim cases under the micro-op hierarchy, remove a duplicated nested sim case, and add local guidance for choosing PTOAS test frameworks.
Update VPTO lit tests to inspect explicit VPTO IR with --emit-vpto where they check VPTO-level rewrites.

Route *_vpto_llvm tests through the VPTO LLVM lowering dump and check llvm.hivm calls from LLVM dialect MLIR instead of stale VPTO output.
Relocate FileCheck-based VPTO .pto tests from test/vpto to test/lit/vpto so they are discovered by the lit framework.

Wrap auto-vecscope tests in a vector kernel submodule to match the current VPTO container form.
* add tcolargmax/min tileops lib implementation

* fix tcolargmax/min implementations and tile_buf declaration

* add init texp high precision implementation

* add texp high precision implementation

* remove exp high precision code in math.py(already in custom.py)

* add empty lines in math.py

* fix texp high precision test case's eps

* fix texp test case

* move exp hp from custom to exp_hp.py
* Rename VPTO MTE semantic ops

Input: existing VPTO memory-transfer semantic ops and tests using dma/cube/acc_store-style names.

Output: public semantic ops use mte_<src>_<dst> names, with ptr addrspace aliases for the new memory-space spelling.

Key steps: update ODS/parser/lowering/expand/ptr-normalize paths; migrate lit, VPTO, and TileLang test inputs; refresh VPTO/ISA docs and legacy raw-op wrapper mappings.

---------

Co-authored-by: mouliangyu <mouliangyu@huawei.com>
* Add OP for TMrgSort

* Add OP for TMrgSort | fix review

* Add constraints to tmov template for UB2UB ND2ND only

Add constraint function to restrict template_tmov_basic to only support:
- UB2UB: Both src and dst must be in Unified Buffer (memory_space="ub")
- ND2ND: Both tiles must have N-dimensional layout (s_layout=NONE_BOX)

Other scenarios (GM2UB, UB2GM, specialized layouts) require different
implementation paths and are explicitly rejected by this constraint.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: caojian5 <caojian5@huawei.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
* Add HP (HIGH_PRECISION) support for TDiv

* Add HP for divs recip rowexpanddiv colexpanddiv

* [fix] Remove redundant else branch in HIGH_PRECISION mode since MLIR validation guarantees f16/f32 only

* [fix] Add missing HIGH_PRECISION test kernels for tdivs ST test

HP test cases were defined in cases.py but missing from implementation
files, causing FileNotFoundError when compare.py tried to read output.bin.

Added 12 HP kernels across three synced layers:
- tdivs.pto: HIGH_PRECISION kernels with precision_mode attribute
- main.cpp: kCases[] entries and launch wrapper declarations
- launch.cpp: Launch functions with correct IEEE 754 scalar values

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: caojian5 <caojian5@huawei.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
@mouliangyu mouliangyu force-pushed the feature-vpto-backend-merge branch from db2bf1f to 21ad259 Compare May 29, 2026 04:48
@zhangstevenunity zhangstevenunity merged commit dffa69c into hw-native-sys:main May 29, 2026
8 checks passed
@reedhecre
Copy link
Copy Markdown

A5 板测成功

  • 触发方式:merged
  • 源码提交:dffa69cf1ed9
  • 结果汇总:OK 21 / FAIL 0 / SKIP 0
  • 日志:/root/ptoas-board-monitor-a5/logs/20260529_171204_merged_pr699.log
  • 结果 TSV:/root/ptoas-board-monitor-a5/logs/20260529_171204_merged_pr699.tsv

@reedhecre
Copy link
Copy Markdown

A3 板测失败

  • 触发方式:merged
  • 源码提交:dffa69cf1ed9
  • 结果汇总:OK 217 / FAIL 2 / SKIP 1
  • 日志:/home/zhongxuan/ptoas-board-monitor/runtime/logs/20260529_171205_merged_pr699.log
  • 失败阶段:board-validation / exit=1

失败用例

  • syncall_binding (run, exit=1)
  • tprefetch_async_binding (run, exit=1)

@reedhecre
Copy link
Copy Markdown

A3 板测失败详情:PR #699

syncall_binding

stage=run info=exit=1

[ERROR] aclrtSynchronizeStream(stream) failed: 507014 (/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260529_171205_merged_pr699/npu_validation/SyncAll/syncall_binding/main.cpp:84)
[ERROR] RecentErrMsg: EZ9999: Inner Error!
EZ9999[PID: 4174955] 2026-05-29-17:57:17.661.050 (EZ9999):  The error from device(chipId:2, dieId:1), serial number is 500, there is an exception of aicore error, core id is 18, error code = 0, dump info: pc start: 0x124a00000000, current: 0x124a00000188, vec error info: 0, mte error info: 0xc503000030, ifu error info: 0x3002200000000, ccu error info: 0x50000009, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd00028c, para base: 0x12c100000000.[FUNC:PrintCoreInfo][FILE:device_error_core_proc.cc][LINE:645]
        TraceBack (most recent call last):
       The extend info: errcode:(0, 0, 0) errorStr: timeout or trap error. fixp_error0 info: 0x3000030, fixp_error1 info: 0xc5, fsmId:1, tslot:3, thread:0, ctxid:0, blk:0, sublk:0, subErrType:4.[FUNC:PrintCoreInfo][FILE:device_error_core_proc.cc][LINE:658]
       Kernel task happen error, retCode=0x25, [aicore timeout].[FUNC:PreCheckTaskErr][FILE:davinci_kernel_task.cc][LINE:1729]
       AICORE Kernel task happen error, retCode=0x25.[FUNC:GetError][FILE:stream.cc][LINE:1475]
       [AIC_INFO] after execute:args print end[FUNC:GetError][FILE:stream.cc][LINE:1475]
       [DFX_INFO]Aicore kernel execute failed, device_id=5, stream_id=46, report_stream_id=46, task_id=0, flip_num=0, fault kernel_name=_Z22syncall_binding_kernelPii, fault kernel info ext=_Z22syncall_binding_kernelPii, program id=0, hash=3129332313788381512.[FUNC:GetError][FILE:stream.cc][LINE:1475]
       rtStreamSynchronize execution failed, reason=aicore timeout[FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:65]
       synchronize stream failed, runtime result = 507014[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:148]
[2026-05-29 17:57:18] ERROR: testcase failed (exit 1): syncall_binding
tprefetch_async_binding

stage=run info=exit=1

[ERROR] aclrtSynchronizeStream(stream) failed: 507035 (/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260529_171205_merged_pr699/npu_validation/TPrefetchAsync/tprefetch_async_binding/main.cpp:91)
[ERROR] RecentErrMsg: EZ9999: Inner Error!
EZ9999[PID: 287027] 2026-05-29-17:57:55.400.318 (EZ9999):  The error from device(chipId:2, dieId:1), serial number is 501, there is an exception of aivec error, core id is 29, error code = 0, dump info: pc start: 0x124a00000000, current: 0x124a00000160, vec error info: 0x22000000dc, mte error info: 0x2706000063, ifu error info: 0x2dfff9c540080, ccu error info: 0x1ce6000000000052, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd00028c, para base: 0x12c100000000.[FUNC:PrintCoreInfo][FILE:device_error_core_proc.cc][LINE:645]
        TraceBack (most recent call last):
       The extend info: errcode:(0, 0x200000000000000, 0) errorStr: The MPU address access is invalid. fixp_error0 info: 0x6000063, fixp_error1 info: 0x27, fsmId:1, tslot:3, thread:0, ctxid:0, blk:0, sublk:0, subErrType:4.[FUNC:PrintCoreInfo][FILE:device_error_core_proc.cc][LINE:658]
       Kernel task happen error, retCode=0x31, [vector core exception].[FUNC:PreCheckTaskErr][FILE:davinci_kernel_task.cc][LINE:1729]
       AIV Kernel happen error, retCode=0x31.[FUNC:GetError][FILE:stream.cc][LINE:1475]
       [AIC_INFO] after execute:args print end[FUNC:GetError][FILE:stream.cc][LINE:1475]
       [DFX_INFO]Aicore kernel execute failed, device_id=5, stream_id=46, report_stream_id=46, task_id=0, flip_num=0, fault kernel_name=_Z30tprefetch_async_binding_kernelPfPa, fault kernel info ext=_Z30tprefetch_async_binding_kernelPfPa, program id=0, hash=8435686547367685641.[FUNC:GetError][FILE:stream.cc][LINE:1475]
       rtStreamSynchronize execution failed, reason=vector core exception[FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:65]
       synchronize stream failed, runtime result = 507035[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:148]
[2026-05-29 17:57:56] ERROR: testcase failed (exit 1): tprefetch_async_binding

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.