fix 0d cpu tensor handling when it's the first arg (#87273)
Fixes https://github.com/pytorch/torchdynamo/issues/1681
When at least one of the pw args is on cuda, set device to cuda. We assume that cases of true device mismatch have been already weeded out during tracing, and what we have is 0d cpu tensor + cuda tensor interop.
Also fix 0d tensor test that previously wasn't compiling with dynamo.
cc @jansel @lezcano @fdrocha
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87273
Approved by: https://github.com/soumith, https://github.com/voznesenskym
Author
Natalia Gimelshein