onnxruntime
2d6e10ba - Update Attention and QAttention to support pruned model (#6819)

Commit

5 years ago

Update Attention and QAttention to support pruned model (#6819) * update Attention operator spec to support pruned model * update Attention and QAttention cpu & cuda kernel * Fix invalid embed layer norm fusion test models.

References

#6819 - Update Attention and QAttention to support pruned model

Author

tianleiwu

Parents

cb8d8464

onnxruntime 2d6e10ba - Update Attention and QAttention to support pruned model (#6819)

onnxruntime
2d6e10ba - Update Attention and QAttention to support pruned model (#6819)