[CUDA] PackedMultiHeadAttention support Bias and separated Q, K and V inputs #16913
add bias and support query, key and value inputs
9f8373d2
tianleiwu
marked this pull request as draft 2 years ago
add test cases
3acdee45
update doc
328dea81
tianleiwu
marked this pull request as ready for review 2 years ago
gh-yewang
approved these changes
on 2023-08-01
tianleiwu
merged
1fbd1ed1
into main 2 years ago
tianleiwu
deleted the tlwu/packed_mha_bias branch 2 years ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub