GQA Flash Attention with Attention Mask #18283
squash merge
ef82d4d3
add unit test and fix build
54f25264
undo work in attention_impl file
e2e1157e
reduce tests and change default behavior for past-kv is nullptr
415440b8
test compatibility w/ no cuda
0e84d2df
exclude from amd
9ba69638
fix test script
65731333
make kernels more efficient and make present output required
a87f211d
Merge branch 'main' into aciddelgado/gqa_memeff_v2
16bda285
merge main and memeff changes
08f553df
address comments
eb125226
update ContribOperators.md
6c6aead9
clarify input and output formats memory efficient attention
db307f36
max sequence length for memory efficient attention
e7a50ee9
clang and fix test file
bc1cf0aa
undo clang on unrelated files
44ca857f
Merge branch 'main' into aciddelgado/gqa_memeff_v2
f08495f7
check value and key inputs
660c8fd5
key and value dont check for nullptr since they are required
b9c4d15d
lint
fd0ecc3b
start work on support for right pad gqa
59b8aa59
attention mask for flash attention with cache
afa8ea01
flash attention works no buffer
4c5a32ab
disable memory efficient and pipeline test
90f23c0d
merge main
40f6e3b5
fix warning and lint
e2eadab7
build warnings
d89995f0
remove kv share flag and bnsh flag
9e2fae76
address comments
2fab38e7
docs
791621b0
undo cmake/external/onnx change
66f1600d
eigen update
6d754f68
deps update
b5b62cdc
change seqlen logic
bcd126ae
comments and docs
96b02705
fix seqlen_k for prompt
11777fe0
test cases
070a2f0e
add memory efficient support
0d8786a1
fix lp
b1ed3efa
Merge branch 'aciddelgado/gqa_seqlens_k_lp_make_it_right' into acidde…
20dd3303
left padding thing
4d2f37e0
Merge branch 'main' into aciddelgado/gqa_seqlens_k
4137c055
lint
70de4f4b
fix left padding and make seqlens clearer
e9672f82
fix left padding indexing issue
7afaa455
fix build break on windows
d9e2a8fd
Merge branch 'aciddelgado/gqa_seqlens_k' of github.com:microsoft/onnx…
8a7c57fb
nit and documentation
4801fe82
remove padding
1e1a25c3
docs
4b3bec16
lint
4a968818
yufenglee
approved these changes
on 2023-11-07
aciddelgado
deleted the aciddelgado/gqa_seqlens_k branch 2 years ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub