Update Attention and QAttention to support pruned model #6819
Update Attention operator to support pruned model
19fda0ed
tianleiwu
marked this pull request as draft 4 years ago
update Attention and QAttention cpu kernel
e9129641
Fix invalid embed layer norm fusion test models.
703259d2
update cuda kernels
bef2af64
tianleiwu
changed the title [WIP] Update Attention operator to support pruned model Update Attention operator to support pruned model 4 years ago
tianleiwu
marked this pull request as ready for review 4 years ago
fix quantized attention cuda kernel
60451abe
tianleiwu
changed the title Update Attention operator to support pruned model Update Attention and QAttention to support pruned model 4 years ago
wangyems
approved these changes
on 2021-02-27
tianleiwu
merged
2d6e10ba
into master 4 years ago
tianleiwu
deleted the tlwu/attention_for_pruned_model branch 4 years ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub