fixing csr addmm bug (#58768)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58768
Fixes gh-58757
This PR has a fix for CPU version of addmm op. Just for context, before this PR, only CSR @ vector was supported. I found out a minor bug in the addmm_out_sparse_csr_dense_cpu for the non MKL code which is solved in this PR.
Moreover, I discovered a limitation in the current MKL implementation. It only works well (acceptable tolerance for output error) with square matrices. I was looking in deep to this issue and I found out that it could be a limitation of the MKL API.
I used this [gist code](https://gist.github.com/aocsa/0606e833cd16a8bfb7d37a5fbb3a5b14) based on [this](https://github.com/baidu-research/DeepBench/blob/master/code/intel/spmm/spmm_bench.cpp) to test this behavior.
As you can see there is not an acceptable output error (last column) when the matrices are squares and there is a not acceptable error when the matrices are not square. I reported the issue here: https://github.com/pytorch/pytorch/issues/58770
Looking forward to your comments.
Test Plan: Imported from OSS
Reviewed By: zou3519
Differential Revision: D28629563
Pulled By: malfet
fbshipit-source-id: 5ee00ae667336e0d9301e5117057213f472cbc86