onnxruntime
Update Attention and QAttention to support pruned model
#6819

Merged

Update Attention and QAttention to support pruned model #6819

tianleiwu merged 5 commits into master from tlwu/attention_for_pruned_model

Update Attention operator to support pruned model

19fda0ed

tianleiwu requested a review 5 years ago

tianleiwu marked this pull request as draft 5 years ago

update Attention and QAttention cpu kernel

e9129641

Fix invalid embed layer norm fusion test models.

703259d2

update cuda kernels

bef2af64

tianleiwu changed the title ~~[WIP] Update Attention operator to support pruned model~~ Update Attention operator to support pruned model 5 years ago

tianleiwu requested a review from

yufenglee 5 years ago

tianleiwu requested a review from

gh-yewang 5 years ago

tianleiwu marked this pull request as ready for review 5 years ago

gh-yewang commented on 2021-02-27

fix quantized attention cuda kernel

60451abe

tianleiwu changed the title ~~Update Attention operator to support pruned model~~ Update Attention and QAttention to support pruned model 5 years ago

gh-yewang requested a review from

gh-yewang 5 years ago

gh-yewang approved these changes on 2021-02-27

tianleiwu merged 2d6e10ba into master 5 years ago

tianleiwu deleted the tlwu/attention_for_pruned_model branch 5 years ago

Reviewers

gh-yewang

yufenglee

Assignees

No one assigned

Labels

None yet

Milestone

No milestone

onnxruntime Update Attention and QAttention to support pruned model #6819 Merged

Update Attention and QAttention to support pruned model #6819

onnxruntime
Update Attention and QAttention to support pruned model
#6819

Merged