Add Continuous Decoding support in GQA #21523
gqa supports interactive
ee47ba45
lint, clang, clean-up manually
d938816d
Merge branch 'main' into aciddelgado/gqa_interactive
01cd33b5
new idea, seqlense_q
fdc84b4d
cpu update
1cddf5f3
cpu almost works but segfaults on non-interactive prompt but we gotta…
8e3483e5
single batch implementation unclean
60fe746d
clean up code
dd3c4a66
clang lint
3565dc2f
changes
bd83af79
trigger pipelines and whatnot
aaa98665
merge main
d4e72f85
pipeline
d8f37c00
pipelines
548ab9be
merge main
83588191
minor rotary test change
5ff6050c
pls
255fa1c6
fixes
11c4a0e3
aciddelgado
marked this pull request as ready for review 1 year ago
docs
4a86c55e
aciddelgado
changed the title Add Interactive Decoding support in GQA Add Continuous Decoding support in GQA 1 year ago
address comments
0be4962b
docs
c0cb4c5e
yufenglee
dismissed these changes
on 2024-09-11
comments
019b058a
aciddelgado
dismissed their stale review
via 019b058a
1 year ago
remove cuda helper
e46ac2d7
rocm
a9e4c768
lint
d92e0f40
docs
f089af33
description
5e8d419b
docs
36ca4d14
tianleiwu
approved these changes
on 2024-09-13
aciddelgado
deleted the aciddelgado/gqa_interactive branch 1 year ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub