pytorch
95575f0a - [DTensor] Fix _get_or_create_default_group() (#96961)

Commit
1 year ago
[DTensor] Fix _get_or_create_default_group() (#96961) Summary: This PR fixes `_get_or_create_default_group()` of `DeviceMesh`. When `mesh` of the first created `DeviceMesh` is not `[0, 1, 2, ... WORLD_SIZE - 1]` and `is_initialized() == False`, it wrongly asserts. This PR fixes this issue by removing these assertions. --- More specifically, `_get_or_create_default_group()` has 4 checks: 1. `DeviceMesh must include every process in WORLD` 2. `DeviceMesh cannot have duplicate values` 3. `DeviceMesh ranks must start from 0` 4. `DeviceMesh should have all ranks of WORLD` 1, 3, and 4 are not satisfied when `self.mesh` is not `[0, 1, 2, ... WORLD_SIZE - 1]`. 2 is a valid check, but it is also checked in `__init__()`, so we don't need to check it again in this function. Test Plan: CI Reviewed By: wanchaol Differential Revision: D44098849 Pull Request resolved: https://github.com/pytorch/pytorch/pull/96961 Approved by: https://github.com/wanchaol
Committer
Parents
Loading