Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
yiyixuxu
left a comment
There was a problem hiding this comment.
thanks for the PR, I left one question
|
|
||
| if attention_mask.ndim == 4: | ||
| # NPU does not support automatic broadcasting for this type; the mask must be expanded. | ||
| if attention_mask.device.type == 'npu' and attention_mask.shape[1:3] == (1, 1): |
There was a problem hiding this comment.
can we verify if we explicitly seet the backend to npu, this would also work?
There was a problem hiding this comment.
When a mask of shape [batch, seq_len] or [batch, 1, 1, seq_len] is passed, the operator fails with an error similar to:
get unsupported atten_mask shape, the shape is [B, 1, 1, S] – while only shapes like [B, N, S, S], [B, 1, S, S], [1, 1, S, S], or [S, S] are accepted.
The _native_npu_attention function operates correctly as it leverages _maybe_modify_attn_mask_npu to reshape the attention mask from [batch_size, seq_len_k] to [batch_size, 1, seq_len_q, seq_len_k]. This reshaped format is compatible with the NPU backend.
Reference:
Ascend NPU fusion attention API:
https://www.hiascend.com/document/detail/zh/Pytorch/730/apiref/torchnpuCustomsapi/docs/context/torch_npu-npu_fusion_attention.md
What does this PR do?
Fix attention_mask broadcasting for NPU compatibility