Fixes CUDA vs CPU consistency for index_put_ when accumulating (part 2) (#67189)
Summary:
Description:
- Follow up PR to https://github.com/pytorch/pytorch/issues/66790 to fix the tests on functorch, https://github.com/pytorch/functorch/issues/195
In functorch, a null tensor is added to the list of indices for the batch dimension in C++, but I can not find an equivalent of that in python without using `torch.jit.script`. If any other better solutions could be suggested, I'd be happy to replace the current way of testing.
cc ngimel zou3519
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67189
Reviewed By: suo
Differential Revision: D31966686
Pulled By: ngimel
fbshipit-source-id: a14b9e5d77d9f43cd728d474e2976d84a87a6ff4