Fix TF32 failures in test_linalg.py (#50453)
Summary:
On Ampere GPU, matmuls are computed by default with TF32 when the dtype is `torch.float`: https://pytorch.org/docs/stable/notes/cuda.html#tensorfloat-32-tf32-on-ampere-devices, which results in reduced precision in results. However, linear algebra usually need higher precision, therefore lots of tests in `test_linalg.py` are failing on Ampere GPU because of precision issue.
To fix this issue:
- Most linear algebra methods, except for matmuls, should add `NoTF32Guard`
- Expected results in unit tests should compute matmuls using numpy instead of pytorch cuda.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50453
Reviewed By: glaringlee
Differential Revision: D26023005
Pulled By: ngimel
fbshipit-source-id: f0ea533494fee322b07925565b57e3b0db2570c5