Mark diag.out composite (#88670)
It's implementation just redispatches, it works for more than CPU/CUDA.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88670
Approved by: https://github.com/anjali411