[Kernel] Flash Attention 3 Support #12093
LucasWilkinson
changed the title [WIP][Kernel] Flash Attention 3 Support [Kernel] Flash Attention 3 Support 1 year ago
LucasWilkinson
marked this pull request as ready for review 1 year ago
build fa3 from vllm
4ec98938
v0 update to seqused_k
b3203983
add env var for controlling version
a84cdda1
update branch and python only build
71e5becb
fix mypy error
a48a9ad4
minor refactors
8d578cf6
codespell fix
234f6c19
update git hash
6d8e439f
update fa3
7f82709a
missing specify fa versions
ef4283b6
update fa3
509e9d85
add assert
34100161
fix tests + softmax
c5b6b4a2
fix building sm80 on H200
10b3cd2c
mgoin
commented
on 2025-01-22
disable fp8 for now
2f9015eb
fix building .cu files that are included by other .cu files
d577b108
fix glob
e8c2fe1c
cut down fa3 binary size
ded7cb6d
update vllm-flash-attn cmake
6ccb5b25
disable fa3 on 8.6 and 8.9
0c16227a
disable fa build on AMD
eafde62d
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub