Prefer at::detail::empty_cuda to the native function (#70618)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70618
`at::native::empty_cuda` is called directly in some places to avoid
the extra dispatch, however it's features like device guards and a
`TensorOptions` overload.
Test Plan: Imported from OSS
Reviewed By: mruberry
Differential Revision: D33623676
Pulled By: ngimel
fbshipit-source-id: 3ac56c4f8acc90281323195a34fc0a1ef8148fbe
(cherry picked from commit 4aaf8b29d0de927ec9ced9f8749a96b2be9c4a89)