[13/N] Update gather with CPU/CUDA implementations (#86409)
Differential Revision: [D40181612](https://our.internmc.facebook.com/intern/diff/D40181612)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86409
Approved by: https://github.com/kwen2501