onnxruntime
2d6e10ba - Update Attention and QAttention to support pruned model (#6819)

Commit
4 years ago
Update Attention and QAttention to support pruned model (#6819) * update Attention operator spec to support pruned model * update Attention and QAttention cpu & cuda kernel * Fix invalid embed layer norm fusion test models.
Author
Parents
Loading