workaround segfault in deviceGuard construction (#41621)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41621
Per title. In some situation, deviceGuard constructor in mul_kernel_cuda segfaults, so construct deviceGuard conditionally only when first argument is scalar.
This does not root cause why deviceGuard constructor segfaults, so the issue might come back.
Test Plan: pytorch oss CI
Reviewed By: jianyuh
Differential Revision: D22616460
fbshipit-source-id: b91bbe55c6eb0bbe80b8d6a61c41f09288752658