Fixes CUDA vs CPU consistency for index_put_ when accumulating (#66790)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/39227
Fixes https://github.com/pytorch/pytorch/issues/66495 (duplicate of 39227)
Description:
- Expands values for CUDA implementation
- Improved shapes checking for CUDA
- Improved error message for CUDA
- Added tests
cc zou3519
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66790
Reviewed By: mruberry
Differential Revision: D31843566
Pulled By: ngimel
fbshipit-source-id: c9e5d12a33e1067619c210174ba6e3cd66d5718b