onnxruntime
Update Attention and QAttention to support pruned model
#6819
Merged

Update Attention and QAttention to support pruned model #6819

tianleiwu merged 5 commits into master from tlwu/attention_for_pruned_model
tianleiwu
tianleiwu Update Attention operator to support pruned model
19fda0ed
tianleiwu tianleiwu requested a review 4 years ago
tianleiwu tianleiwu marked this pull request as draft 4 years ago
tianleiwu update Attention and QAttention cpu kernel
e9129641
tianleiwu Fix invalid embed layer norm fusion test models.
703259d2
tianleiwu update cuda kernels
bef2af64
tianleiwu tianleiwu changed the title [WIP] Update Attention operator to support pruned model Update Attention operator to support pruned model 4 years ago
tianleiwu tianleiwu requested a review from yufenglee yufenglee 4 years ago
tianleiwu tianleiwu requested a review from wangyems wangyems 4 years ago
tianleiwu tianleiwu marked this pull request as ready for review 4 years ago
wangyems
wangyems commented on 2021-02-27
tianleiwu fix quantized attention cuda kernel
60451abe
tianleiwu tianleiwu changed the title Update Attention operator to support pruned model Update Attention and QAttention to support pruned model 4 years ago
wangyems wangyems requested a review from wangyems wangyems 4 years ago
wangyems
wangyems approved these changes on 2021-02-27
tianleiwu tianleiwu merged 2d6e10ba into master 4 years ago
tianleiwu tianleiwu deleted the tlwu/attention_for_pruned_model branch 4 years ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone