[Resubmission] fix mul_out CUDA config for COO tensors (#80254)
Fixes https://github.com/pytorch/pytorch/issues/79914
Duplicate of https://github.com/pytorch/pytorch/pull/79937 . I wasn't able to push changes to the existing PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80254
Approved by: https://github.com/eellison