[PyTorch] Avoid extra Tensor refcounting in _cat_out_cpu (#49364)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49364
We had a local `Tensor` when we only needed a `const Tensor&`.
ghstack-source-id: 118624595
Test Plan: Internal benchmark.
Reviewed By: hlu1
Differential Revision: D25544731
fbshipit-source-id: 7b9656d0371ab65a6313cb0ad4aa1df707884c1c