Decoder Attention CUDA Op (#9792)

Commit

4 years ago

Decoder Attention CUDA Op (#9792) * add kernel interface * register kernel * add self/cross qkv projection without cache * add LaunchTransQkv2 for (S,B,X,N,H) -> (X,B,N,S,H) * refactor ConcatPastToPresent * DecoderQkvToContext interface * q,k,v buffer and cache as output * qk, pv and transctx * fix compiler error on linux machine * key_padding_mask * add test_parity file. However not runnable * add partial unittest * made partial attributes to inputs * --gen_doc * change kernel interface, add more tests * morre parity tests * fix test * fix typo * transpose optimizer has bug. remove it temporarily * add input shape checks * add type/shape inference * fix cache shape check * fix rocm build failure * fix rocm build error * review comments * review comments

References

#9792 - Decoder Attention CUDA Op

Author

gh-yewang

Parents

16ddaf56

onnxruntime 6856619b - Decoder Attention CUDA Op (#9792)

onnxruntime
6856619b - Decoder Attention CUDA Op (#9792)