onnxruntime
2d6e10ba
- Update Attention and QAttention to support pruned model (#6819)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
4 years ago
Update Attention and QAttention to support pruned model (#6819) * update Attention operator spec to support pruned model * update Attention and QAttention cpu & cuda kernel * Fix invalid embed layer norm fusion test models.
References
#6819 - Update Attention and QAttention to support pruned model
Author
tianleiwu
Parents
cb8d8464
Loading