Skip to content

issue/1117: metax support flash-attn#1119

Open
Ceng23333 wants to merge 4 commits intomainfrom
metax_fla
Open

issue/1117: metax support flash-attn#1119
Ceng23333 wants to merge 4 commits intomainfrom
metax_fla

Conversation

@Ceng23333
Copy link
Copy Markdown
Collaborator

@Ceng23333 Ceng23333 requested review from a team, Ziminli, kilinchange, voltjia and wooway777 April 3, 2026 04:34
Comment thread xmake.lua Outdated
Comment thread include/infinicore/adaptor/aten_adaptor.hpp Outdated

void run(void *planned_meta) {
#ifdef ENABLE_FLASH_ATTN
#ifdef ENABLE_NVIDIA_API
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这块的两个if的内容是一样的吧

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好像确实一样

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

刚才轶群说,QY也需要这个,所以是不是合并一下把判断去掉就行了?还是说把三个都加上才能不影响其他平台?

#include <stdexcept>

#ifdef ENABLE_FLASH_ATTN
#if defined(ENABLE_NVIDIA_API) || defined(ENABLE_METAX_API)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

也需要加qy?或者不需要加这个if

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

看原来的实现,QY不需要这个

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

没加的话,就是默认需要,因为qy默认走跟nv一样的路径

@wooway777 wooway777 requested a review from qinyiqun April 16, 2026 01:25
Comment thread src/infinicore/ops/multi_head_attention_varlen/mha_varlen_flashattn.cc Outdated

void run(void *planned_meta) {
#ifdef ENABLE_FLASH_ATTN
#ifdef ENABLE_NVIDIA_API
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好像确实一样


void run(void *planned_meta) {
#ifdef ENABLE_FLASH_ATTN
#ifdef ENABLE_NVIDIA_API
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

刚才轶群说,QY也需要这个,所以是不是合并一下把判断去掉就行了?还是说把三个都加上才能不影响其他平台?

auto out_tensor = infinicore::adaptor::to_aten_tensor(p->out);
// Paged KV caches must be contiguous for flash-attn; avoid extra copies for q/metadata when already dense.
auto out_at = infinicore::adaptor::to_aten_tensor(p->out);
const bool out_need_copy_back = !out_at.is_contiguous();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这两个contiguous nvidia本身是不是不需要做?
另外应该改成先用我们的contiguous再转成aten tesnor

auto v_cache = infinicore::adaptor::to_aten_tensor(p->v_cache);
#elif defined(ENABLE_QY_API)
#elif defined(ENABLE_QY_API) || defined(ENABLE_METAX_API)
auto k_cache = infinicore::adaptor::to_aten_tensor(p->k_cache).contiguous();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

先用我们的contiguous再转成aten tesnor

VarlenFlashPrepared t;
// Varlen flash-attn: keep k/v contiguous for dense/paged layout; avoid extra copies for q/metadata when already dense.
t.q = infinicore::adaptor::to_aten_tensor(p->q);
t.k = infinicore::adaptor::to_aten_tensor(p->k).contiguous();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

先用我们的contiguous再转成aten tesnor

@Ceng23333 Ceng23333 requested a review from wooway777 April 22, 2026 02:35
Signed-off-by: Ceng23333 <441651826@qq.com>
Signed-off-by: Ceng23333 <441651826@qq.com>
Signed-off-by: Ceng23333 <441651826@qq.com>
Signed-off-by: Ceng23333 <441651826@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants