Fix CUDA device guard usage when first arg of kernel is scalar (#39870)
Summary:
Add an OptionalDeviceGuard for second arg in gpu_kernel_with_scalars when first arg is scalar
Closes https://github.com/pytorch/pytorch/issues/38889
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39870
Differential Revision: D22011184
Pulled By: ngimel
fbshipit-source-id: 427291c456e879f25d15ab76a60b5d4ad61f3b3f