Test distributed collectives profiling with Gloo on GPU (#49072)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49072
As per the title, we should enable these tests for Gloo when run on GPU and the profiler is enabled with `use_cuda=True`. Enabling ProcessGroupNCCL profiling test to work with `use_cuda=True` is being tracked in https://github.com/pytorch/pytorch/issues/48987.
ghstack-source-id: 118789003
Test Plan: CI
Reviewed By: mrshenli
Differential Revision: D25388986
fbshipit-source-id: 664d922ac2e10c77299daebdc6d3c92bb70eb56e