Enable `out` OpInfo testing for `torch.where` (#121473)
And fix behavior discrepancy between CPU and CUDA by raising an error when `out.dtype` is unexpected
Fixes https://github.com/pytorch/pytorch/issues/121397
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121473
Approved by: https://github.com/Skylion007, https://github.com/albanD