onnxruntime
1fbd1ed1 - [CUDA] PackedMultiHeadAttention support Bias and separated Q, K and V inputs (#16913)

Commit
2 years ago
[CUDA] PackedMultiHeadAttention support Bias and separated Q, K and V inputs (#16913) ### Description Follow-up change for PackedMultiHeadAttention added in https://github.com/microsoft/onnxruntime/pull/16779: - [x] Add Bias input - [x] Add CUDA kernels to support separated query, key and values inputs. - [x] Update operator documents - [x] Add unit tests
Author
Parents
Loading