pad low precision matmuls when requested (#90235)
Matmul padding is beneficial not only for fp32, fp16/bf16 with amp can benefit as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90235
Approved by: https://github.com/jiawenliu64
Author
Natalia Gimelshein