Wrap indirect indexing on CUDA (#105055)
Lifting this to CPU should be rather easy. @jgong5
Partially fixes https://github.com/pytorch/pytorch/issues/97365. I'd wait to close that issue once this works on CPU as well.
This fix works with dynamic shapes as well.
@voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @ngimel @yf225 @chenyang78 @kadeng @muchulee8
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105055
Approved by: https://github.com/peterbell10, https://github.com/jansel