Implement FlashAttention for CPU #20805
duanqn
commented
on 2024-05-27
duanqn
commented
on 2024-05-27
duanqn
force pushed
from
db5fcb26
to
f7235b30
1 year ago
duanqn
force pushed
from
37d23258
to
c8c12fff
1 year ago
duanqn
commented
on 2024-06-19
duanqn
force pushed
from
c2da456a
to
f8584305
1 year ago
duanqn
force pushed
from
f8584305
to
599ac3f3
1 year ago
tianleiwu
marked this pull request as ready for review 1 year ago
duanqn
force pushed
from
7b82ac51
to
60e22806
1 year ago
duanqn
commented
on 2024-06-21
tianleiwu
changed the title [WIP] Implement FlashAttention for CPU Implement FlashAttention for CPU 1 year ago
duanqn
force pushed
from
57eba5fd
to
7b130037
1 year ago
toothache
approved these changes
on 2024-06-25
tianleiwu
dismissed these changes
on 2024-06-28
duanqn
dismissed their stale review
via 5a96b44e
1 year ago
Register new contrib op FlashAttention
37a31756
Move getenv to constructor
4ebe4546
Get Env
0d65ce26
Renaming
42b2acb8
Check for T==float
88a2600c
Lintrunner
53e2e851
Remove mlas function
945f656a
Handle scale; Require present_k and present_v to be empty
ee323fb4
Check is_unidirectional_
63e76ad0
fix build
3d6368b3
Merge with mlas.h
1fba73a3
Add comment and MLASCALL
9479623e
Remove unnecessary change
1e63e825
Fix onnxruntime_mlas.cmake
1fd0813d
Pick onnxruntime/test/python/transformers/benchmark_mha.py from lates…
afb74661
Disable FlashAttention by default
327b4c2e
Fix value choice of row_size_q and row_size_kv; Add comments
ab0da5b1
Fix order
8b190947
causal=False
8b2270a2
Add MLASCALL on implementation
b449524e
Improve comment
06251b1a
Enable FlashAttention by default
27b18d43
lintrunner -a
3059b44a
Remove memset
412f219b
Fix l2_cache_size_
44ff8f0a
Fix PREfast
54213354
#include <algorithm>
03d8f363
Fix bug
d63e528b
lintrunner
7a3d4a6c
Renaming
bf014d0b
Renaming
baff456a
Use MlasSgemmOperation
72f3c677
Move threading inside MLAS kernel
e8a4373c
Remove 1 TODO
e1cf2890
Renaming
852fd98d
duanqn
force pushed
from
5554531b
to
852fd98d
1 year ago
tianleiwu
approved these changes
on 2024-07-11
yufenglee
approved these changes
on 2024-07-11
yufenglee
merged
80b56feb
into main 1 year ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub