[PyTorch] Migrate remaining CUDA TI usage to borrowing where possible (#58278)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58278
Borrowing is more efficient, and we can see in all these cases that the TensorIterator doesn't outlive the input & output Tensors.
ghstack-source-id: 129002042
Test Plan: Existing CI
Reviewed By: ezyang
Differential Revision: D28428809
fbshipit-source-id: 23ccf508c4413371a88085271f11c7d0cc861a9e