Remove unnecessary tensor copies (#33732)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33732
move and forward instead of copy
Benchmarks:
A microbenchmark calling the add operation on two tensors in a tight loop shows a 5% improvement in performance.
No visible change for a model like resnet that does more work in its kernels.
ghstack-source-id: 99161486
Test Plan: benchmarks
Differential Revision: D20082642
fbshipit-source-id: eeac59686f8621dd5eaa85d61e6d219bba48c847