onnxruntime
[CUDA] PackedMultiHeadAttention support Bias and separated Q, K and V inputs
#16913
Merged

[CUDA] PackedMultiHeadAttention support Bias and separated Q, K and V inputs #16913

tianleiwu merged 3 commits into main from tlwu/packed_mha_bias
tianleiwu
tianleiwu add bias and support query, key and value inputs
9f8373d2
tianleiwu tianleiwu marked this pull request as draft 2 years ago
tianleiwu add test cases
3acdee45
tianleiwu update doc
328dea81
tianleiwu tianleiwu requested a review from yufenglee yufenglee 2 years ago
tianleiwu tianleiwu assigned gh-yewang gh-yewang 2 years ago
tianleiwu tianleiwu unassigned gh-yewang gh-yewang 2 years ago
tianleiwu tianleiwu requested a review from gh-yewang gh-yewang 2 years ago
tianleiwu tianleiwu marked this pull request as ready for review 2 years ago
gh-yewang
gh-yewang approved these changes on 2023-08-01
tianleiwu tianleiwu merged 1fbd1ed1 into main 2 years ago
tianleiwu tianleiwu deleted the tlwu/packed_mha_bias branch 2 years ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone