Dont skip NCCL backend when testing all_reduce_cuda (#48231)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48231
Noticed that these tests were being skipped with NCCL backend, but
there doesn't appear to be a valid reason to. Enabled these tests and verify
that they pass with 500 stress runs.
ghstack-source-id: 117085209
Test Plan: CI
Reviewed By: SciPioneer
Differential Revision: D25079030
fbshipit-source-id: 8204288ffbd387375a1a86fe8c07243cfd855549