PR #9792 Decoder Attention CUDA Op

Decoder Attention CUDA Op #9792

gh-yewang merged 28 commits into master from wangye/decoder_attn_pr

add kernel interface

d0d6cda3

28ec5dc0

add self/cross qkv projection without cache

b9048143

add LaunchTransQkv2 for (S,B,X,N,H) -> (X,B,N,S,H)

becd565d

refactor ConcatPastToPresent

88fe5e76

DecoderQkvToContext interface

11f10a3a

q,k,v buffer and cache as output

d0cec212

qk, pv and transctx

96100882

fix compiler error on linux machine

738ac483

key_padding_mask

0c14142a

add test_parity file. However not runnable

452ae522

add partial unittest

8aeff2a0

made partial attributes to inputs

eb5e3eae

--gen_doc

f5f83a97

change kernel interface, add more tests

a0c12c8d

morre parity tests

7fe75ae2

fix test

54d04018

fix typo

54f59846

transpose optimizer has bug. remove it temporarily

960a3b23

add input shape checks

d7f1d57d

add type/shape inference

a332396d

Merge branch 'master' into wangye/decoder_attn_pr

6dcb0006

fix cache shape check

a95dd438

Merge branch 'wangye/decoder_attn_pr' of github.com:microsoft/onnxrun…

bd3eb455

gh-yewang marked this pull request as ready for review 4 years ago

gh-yewang requested a review from

hanbitmyths 4 years ago

hanbitmyths commented on 2021-11-18

fix rocm build failure

a207efb1

gh-yewang requested a review from

hanbitmyths 4 years ago

fix rocm build error

b5d205f1

hanbitmyths commented on 2021-11-19

review comments

91fb9df4

gh-yewang requested a review from

hanbitmyths 4 years ago

hanbitmyths commented on 2021-11-19

review comments

563968e9

gh-yewang requested a review from

hanbitmyths 4 years ago

hanbitmyths approved these changes on 2021-11-20

gh-yewang merged 6856619b into master 4 years ago

gh-yewang deleted the wangye/decoder_attn_pr branch 4 years ago

Reviewers

hanbitmyths

Assignees

No one assigned

Labels

None yet

Milestone

No milestone

onnxruntime Decoder Attention CUDA Op #9792 Merged

Decoder Attention CUDA Op #9792

onnxruntime
Decoder Attention CUDA Op
#9792

Merged