[BE] Speed up runtime of test_ddp_model_diff_across_ranks (#55659)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55659
As per https://github.com/pytorch/pytorch/issues/55583, this is the most expensive distributed test.
Instead of waiting for process 0 in this test to be taken down by
nccl_async_error_handling, just remove the barrier and let the process exit
when the backend is NCCL.
A slight downside here is that the test no longer verifies that the process
would be brought down by nccl_async_error_handling, but
nccl_async_error_handling is already well tested in other tests. If we feel we
need to ensure this for this test, then we can pass in a process group with a
smaller timeout as an alternative solution.
The test now runs in 4-6s as opposed to 70. Ran the test 1000 times to verify
no flakiness
ghstack-source-id: 126590904
Test Plan: CI
Reviewed By: mrshenli
Differential Revision: D27672161
fbshipit-source-id: 38fb518606daac9b0390ca4c3ce1a72dc2da36fc