Reduce memory usage requirement of test_warp_softmax_64bit_indexing in test_nn.py (re-open of #85037) (#85373)
CC @ngimel @xwang233 @ptrblck
Adds fix for `get_tolerances`, tested locally on a dgx Volta.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85373
Approved by: https://github.com/ngimel