Add GQA support for ROCm #21032
feat: init rocm gqa
c877adcb
feat: extend strided copy to support runtime tok idx
f845099d
more case
0ea33352
feat: local
99b2feb6
feat: rotary
816249c8
feat: allow rotary to read and write in a strided way, so that we don…
6024dc96
fix: rotary for packed qkv
48092eee
remove debug print
de2f30ae
cloudhan
force pushed
from
d33e0b74
to
19970a75
1 year ago
cloudhan
force pushed
from
19970a75
to
23e20bc6
1 year ago
cloudhan
force pushed
from
23e20bc6
to
6ccd1d7c
1 year ago
cloudhan
force pushed
from
6ccd1d7c
to
b6be9bde
1 year ago
cloudhan
force pushed
from
b6be9bde
to
14d1a1ab
1 year ago
workaround: add flash_attn test to ci
6c4e6125
add gpu arch checking warning log
e9f6d13f
fix: build without ck tile
2b0c46ed
cloudhan
force pushed
from
14d1a1ab
to
2b0c46ed
1 year ago
test: update ci pytorch and triton version to fix tests which have fa…
6091a69b
format
8ca06340
remove unused param is_input_bnsh_format from strided version LaunchR…
e22dfb9e
cloudhan
marked this pull request as ready for review 1 year ago
make onnxruntime_USE_COMPOSABLE_KERNEL_CK_TILE depends on onnxruntime…
789fee73
skip test_flash_attn_rocm on CUDA platform
b973217f
fix lint and ci
c3c7089d
fix typo
f4355d46
cloudhan
force pushed
from
1384ff45
to
f4355d46
1 year ago
tianleiwu
approved these changes
on 2024-07-02
mszhanyi
approved these changes
on 2024-07-03
cloudhan
merged
f39ee14b
into main 1 year ago
cloudhan
deleted the guangyunhan/rocm-gqa branch 1 year ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub