[PyTorch] Migrate CUDA indexing TI usage to borrowing (#58277)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58277
Borrowing is more efficient, and we can see in all these cases that the TensorIterator doesn't outlive the input & output Tensors.
ghstack-source-id: 129002044
Test Plan: Existing CI
Reviewed By: ngimel
Differential Revision: D28428441
fbshipit-source-id: 243b746aeb5fdf8b95c8e591c066c5eab140deb6