Change TensorIterator to be stack allocated, using named return value optimization to elide copies.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22519
Differential Revision: D16451460
fbshipit-source-id: 6ca6ae2fdf1af5a2f792b42e55279413971b3c46