Relax divsibilty by 16 for leading dimension of mat1 in scaled_gemm (#108308)
# Summary
CublasLT requires that the matrices be 16 byte aligned. If mat1.size(-1) % 16 == 0 and the matrix is row major than the leading dimension can be any value. See this coment: https://github.com/pytorch/pytorch/pull/107341#discussion_r1310934737
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108308
Approved by: https://github.com/eqy, https://github.com/vkuzo