Skip to content

Feat/cast multi backend#56

Draft
gongchensu wants to merge 16 commits intoInfiniTensor:feat/ascend-operatorsfrom
gongchensu:feat/cast-multi-backend
Draft

Feat/cast multi backend#56
gongchensu wants to merge 16 commits intoInfiniTensor:feat/ascend-operatorsfrom
gongchensu:feat/cast-multi-backend

Conversation

@gongchensu
Copy link
Copy Markdown
Contributor

No description provided.

zhangyue and others added 16 commits April 15, 2026 13:34
- Add AclTensorCache for descriptor reuse across operator calls
- Rename ToAclDtype/IsIntegerDtype to toAclDtype/isIntegerDtype (camelCase)
- Extend WorkspacePool with multi-slot support and capture-mode assertion
- Optimize Gemm kernel with executor/scalar caching
- Add CacheKey hash support for operator instance caching
- Fix generate_wrappers.py argument ordering and format
- Rename skip_unsupported_dtypes fixture, add get_npu_stream utility
Add base classes: Cast, Cat, Linear, Matmul (replaces MatMul), Mul,
PagedAttention, SiluAndMul.

Rename AddRmsNorm params to match CANN convention (x1/x2/gamma/y_out/x_out).
Remove verbose doc comments from FlashAttention, ReshapeAndCache,
RotaryEmbedding base classes (implementation details belong in kernels).
Add ACLNN-based implementations for: Add, Cast, Cat, CausalSoftmax,
FlashAttention, Linear, Matmul, Mul, RmsNorm, RotaryEmbedding,
ReshapeAndCache (+ v2), Swiglu, SiluAndMul.

All kernels use AclTensorCache for descriptor reuse and
WorkspacePool for device memory management. Executor instances
are cached with aclSetAclOpExecutorRepeatable for repeat dispatch.
Add alternative implementations with registries:
- AddRmsNorm: decomposed (0), fused aclnnAddRmsNorm (1), custom AscendC (2)
- RmsNorm: ACLNN (0), custom AscendC (1)
- RotaryEmbedding: ACLNN (0), ATB Rope (1)
- ReshapeAndCache: ACLNN (0), ScatterPaKvCache (1), ATB (2)
- Swiglu: decomposed (0), fused aclnnSwiGlu (1)
- SiluAndMul: fused aclnnSwiGlu (0), registry (1)
- PagedAttention: ATB (0)
Standalone AscendC kernel project with CMake build system.
Includes op_host tiling, op_kernel device code, precision tests,
and msprof benchmarks for both operators.
Add new tests: Cast, Cat, E2E Layer, FlashAttention, Linear, Matmul,
Mul, PagedAttention, ReshapeAndCache, RotaryEmbedding, SiluAndMul.
Update existing tests with NPU stream handling and Ascend-specific
parametrization.
- C1: auto-format all C++ files with clang-format (25 files)
- C4: lowercase assert messages, remove trailing periods (10 messages)
- G4: backtick-fence identifiers in comments (causal_softmax)
- P5: add blank lines before return statements (generate_wrappers.py)
- C4: lowercase assert message starts (workspace_pool_, rms_norm, rotary_embedding)
- C4: remove trailing period from workspace_pool_ assert
- C9: add blank line between SlotKey struct members
- G4: backtick-fence identifiers in comments across 12 files
- G4: backtick-fence identifiers in assert messages (flash_attention, rotary_embedding)
- P1: remove duplicate `import re` in generate_wrappers.py
- P4: add blank lines around control flow in test_flash_attention.py
- C4: lowercase "rope" in ATB assert messages
- G4: backtick-fence `VariantPack`, `rotaryCoeff`, `sparseMode`, `hostData`
- G4: backtick-fence identifiers in Python test comments
- P4: add blank line before `if` in test_rms_norm_precision.py
… loading

- Delete `test_rms_norm_precision.py` (duplicate of `tests/test_rms_norm.py`)
- Delete `run_rms_norm_precision_report.py` (another copy with hardcoded path)
- Unify `test_add_rms_norm.py` to use `import ascend_kernel` instead of
  ctypes manual loading
@gongchensu gongchensu self-assigned this Apr 16, 2026
@gongchensu
Copy link
Copy Markdown
Contributor Author

A100编译和算子测试:
image
天数编译和算子测试:
image
沐曦编译和算子测试:
image
摩尔编译和算子测试:
image

@zhangyue207 zhangyue207 force-pushed the feat/ascend-operators branch from 0d93135 to df07f95 Compare April 17, 2026 20:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant