llama.cpp
CANN: refactor mask handling and improve performance in FA
#15561
Merged

CANN: refactor mask handling and improve performance in FA #15561

hipudding merged 3 commits into ggml-org:master from noemotiovon:fa
noemotiovon
github-actions github-actions added ggml
github-actions github-actions added Ascend NPU
noemotiovon noemotiovon force pushed from c538c5af 117 days ago
noemotiovon noemotiovon force pushed 117 days ago
noemotiovon noemotiovon changed the title [CANN]Optimization of unnecessary repeat in the FA operator CANN: refactor mask handling and improve performance in FA 117 days ago
hipudding
hipudding commented on 2025-08-26
hipudding
hipudding commented on 2025-08-26
hipudding
hipudding approved these changes on 2025-08-26
noemotiovon
noemotiovon CANN(flash-attn): refactor mask handling and improve performance
c9456ef0
noemotiovon [CANN]: fix review
92e61dd1
noemotiovon [CANN]: Optimization FA BNSD to BSND
db86df3a
noemotiovon noemotiovon force pushed to db86df3a 116 days ago
hipudding hipudding merged 1e748974 into master 116 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone