Reduce memory requirement for test_argminmax_large_axis (#40036)
Summary:
Closes gh-39060
The `TensorIterator` splitting is based on `can_use_32bit_indexing` which assumes 32-bit signed ints, so we can get away with just 2**31 as the axis length. Also tested on an old commit that I can reproduce the test failure on just a 1d tensor, overall quartering the memory requirement for the test.
https://github.com/pytorch/pytorch/blob/4c7d81f8479bce320cc11d1eb3adaf8ab0b90099/aten/src/ATen/native/TensorIterator.cpp#L879
For reference, the test was first added in gh-33310.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40036
Differential Revision: D22068690
Pulled By: ezyang
fbshipit-source-id: 83199fd31647d1ef106b08f471c0e9517d3516e3