[fix] torch.cat : cross-device check for out and input tensors (#53004)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/52044 (`stack` dispatches to `cat`)
The way dispatcher works, currently this case happens only in CUDA kernel (CPU kernel is chosen if all inputs and out are on CPU). That is why the check is added only on the CUDA side.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53004
Reviewed By: albanD
Differential Revision: D27003956
Pulled By: mruberry
fbshipit-source-id: 818ea0f76153f4fa281740f30705e5ef018413f6