DeepSpeed
c37fe9cb - Fix exception handling in get_all_ranks_from_group() function (#4862)

Comment changes are shownComment changes are hidden
Commit
1 year ago
Fix exception handling in get_all_ranks_from_group() function (#4862) In the latest Pytorch nightly, the exception thrown from `torch.distributed.distributed_c10d.get_global_rank()` is changed from `RuntimeError` to `ValueError` so we need to update our try-catch in `deepspeed.comm` Tested with torch version 2.3.0.dev20231221+cu121 Fixes: https://github.com/microsoft/DeepSpeed/issues/4853
Author
Parents
  • deepspeed/comm
    • File
      comm.py