onnxruntime
GQA Rotary and Packed QKV with Flash
#18906
Merged

GQA Rotary and Packed QKV with Flash #18906

aciddelgado merged 58 commits into main from aciddelgado/gqa_rotary_packed
aciddelgado
aciddelgado squash merge
ef82d4d3
aciddelgado add unit test and fix build
54f25264
aciddelgado undo work in attention_impl file
e2e1157e
aciddelgado reduce tests and change default behavior for past-kv is nullptr
415440b8
aciddelgado test compatibility w/ no cuda
0e84d2df
aciddelgado exclude from amd
9ba69638
aciddelgado fix test script
65731333
aciddelgado work on local attention flash
bd79b6d7
aciddelgado vscode idk
fb8e386d
aciddelgado make kernels more efficient and make present output required
a87f211d
aciddelgado Merge branch 'main' into aciddelgado/gqa_memeff_v2
16bda285
aciddelgado merge main and memeff changes
08f553df
aciddelgado address comments
eb125226
aciddelgado update ContribOperators.md
6c6aead9
aciddelgado merge main
6e540e33
aciddelgado Merge branch 'aciddelgado/gqa_memeff_v2' into aciddelgado/gqa_local
b3a9d0f3
aciddelgado local working with flash not memeff
3d7f3bf6
aciddelgado clarify input and output formats memory efficient attention
db307f36
aciddelgado max sequence length for memory efficient attention
e7a50ee9
aciddelgado clang and fix test file
bc1cf0aa
aciddelgado undo clang on unrelated files
44ca857f
aciddelgado Merge branch 'main' into aciddelgado/gqa_memeff_v2
f08495f7
aciddelgado check value and key inputs
660c8fd5
aciddelgado key and value dont check for nullptr since they are required
b9c4d15d
aciddelgado fix up test script
afe22a43
aciddelgado fix packedmha, clean test, merge gqa_memeff_v2 branch changes
01256785
aciddelgado Merge branch 'main' into aciddelgado/gqa_local
353c4f59
aciddelgado local working with recent changes
94b5efbe
aciddelgado no local w memeff
ddb7a66b
aciddelgado undo unnecessary changes
d33c69b6
aciddelgado undo change symbolic_shape_infer.py
26ca6d5e
aciddelgado fix pipeline
163a81ec
aciddelgado docs
4326c8db
yufenglee make prompt to use the kv new
b0a1006c
aciddelgado clean up
23c22a33
aciddelgado update documentation
e97a6fda
aciddelgado Merge branch 'main' into aciddelgado/gqa_local
93cb019d
aciddelgado start work rotary
3c332f06
aciddelgado Merge branch 'main' into aciddelgado/gqa_rotary
b26b4cb8
aciddelgado rotary work
082f347d
aciddelgado Merge branch 'main' into aciddelgado/gqa_rotary
ae34d0cf
aciddelgado Merge branch 'yufeng/gqa_opt' into aciddelgado/gqa_rotary
791bbc3d
aciddelgado rotary fully implemented
6e6ad2cb
aciddelgado packed working
f67316d7
aciddelgado aciddelgado requested a review from tianleiwu tianleiwu 2 years ago
aciddelgado aciddelgado requested a review from yufenglee yufenglee 2 years ago
aciddelgado aciddelgado changed the title Aciddelgado/gqa rotary packed GQA Rotary and Packed QKV with Flash 2 years ago
github-advanced-security
github-advanced-security commented on 2023-12-21
github-advanced-security
github-advanced-security commented on 2023-12-21
aciddelgado run formatters
80845859
github-advanced-security
github-advanced-security commented on 2023-12-21
github-advanced-security
github-advanced-security commented on 2023-12-21
aciddelgado enable gpu linux pipeline transformers tests
94afb76e
aciddelgado aciddelgado requested a review 2 years ago
aciddelgado docs and pipeline test
40cfc268
tianleiwu
tianleiwu commented on 2024-01-02
tianleiwu
tianleiwu commented on 2024-01-04
tianleiwu
tianleiwu commented on 2024-01-04
aciddelgado Merge branch 'main' into aciddelgado/gqa_rotary_packed
a821588a
aciddelgado test
b4732832
aciddelgado run pipeline
152d920e
aciddelgado retrigger checks
f271a74e
aciddelgado conflict and requirements change
ba13a3f7
aciddelgado disable transformers test
723637ed
aciddelgado add todo and format
87533ef8
aciddelgado fix lint issue
615d500c
yufenglee
yufenglee commented on 2024-01-13
aciddelgado merge conflict
dba1e7e1
tianleiwu
tianleiwu commented on 2024-01-22
tianleiwu
tianleiwu commented on 2024-01-22
tianleiwu
tianleiwu dismissed these changes on 2024-01-22
aciddelgado address comments
e7863b30
aciddelgado aciddelgado dismissed their stale review via e7863b30 2 years ago
tianleiwu
tianleiwu dismissed these changes on 2024-01-22
aciddelgado lintrunner
5b554246
aciddelgado aciddelgado dismissed their stale review via 5b554246 2 years ago
tianleiwu
tianleiwu approved these changes on 2024-01-23
aciddelgado aciddelgado merged cbb29d80 into main 2 years ago
aciddelgado aciddelgado deleted the aciddelgado/gqa_rotary_packed branch 2 years ago
tianleiwu tianleiwu added release:1.17.0
snnn snnn removed release:1.17.0
snnn

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone