Decoder Attention CUDA Op #9792
add kernel interface
d0d6cda3
register kernel
28ec5dc0
add self/cross qkv projection without cache
b9048143
add LaunchTransQkv2 for (S,B,X,N,H) -> (X,B,N,S,H)
becd565d
refactor ConcatPastToPresent
88fe5e76
DecoderQkvToContext interface
11f10a3a
q,k,v buffer and cache as output
d0cec212
qk, pv and transctx
96100882
fix compiler error on linux machine
738ac483
key_padding_mask
0c14142a
add test_parity file. However not runnable
452ae522
add partial unittest
8aeff2a0
made partial attributes to inputs
eb5e3eae
--gen_doc
f5f83a97
change kernel interface, add more tests
a0c12c8d
morre parity tests
7fe75ae2
fix test
54d04018
fix typo
54f59846
transpose optimizer has bug. remove it temporarily
960a3b23
add input shape checks
d7f1d57d
add type/shape inference
a332396d
Merge branch 'master' into wangye/decoder_attn_pr
6dcb0006
fix cache shape check
a95dd438
Merge branch 'wangye/decoder_attn_pr' of github.com:microsoft/onnxrun…
bd3eb455
gh-yewang
marked this pull request as ready for review 4 years ago
fix rocm build failure
a207efb1
fix rocm build error
b5d205f1
review comments
91fb9df4
review comments
563968e9
gh-yewang
merged
6856619b
into master 4 years ago
gh-yewang
deleted the wangye/decoder_attn_pr branch 4 years ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub