pytorch
e3bf5000 - Hide the contiguous requirement for user input mesh when initializing DeviceMesh (#110628)

Commit

1 year ago

Hide the contiguous requirement for user input mesh when initializing DeviceMesh (#110628) Summary: As title, this diff hides the contiguous requirement for user input mesh when initializing DeviceMesh. In the current implementation, when testing with inter-node model parallelism, an exception is thrown during mesh validation when the following input is provided: ``` mesh = torch.arange(0, world_size).view(mp_size, dp_size).transpose(0, 1) device_mesh = DeviceMesh( "cuda", mesh.contiguous(), mesh_dim_names=("dp", "mp") ) ``` Test Plan: **Unit Test**: ``` buck2 test mode/dev-nosan //caffe2/test/distributed/_tensor:device_mesh -- test_validate_device_mesh Test UI: https://www.internalfb.com/intern/testinfra/testrun/3940649876878399 Network: Up: 0B Down: 0B Jobs completed: 6. Time elapsed: 1:58.7s. Tests finished: Pass 1. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` **Test with MP** ``` mesh = torch.arange(0, world_size).view(mp_size, dp_size).transpose(0, 1) device_mesh = DeviceMesh( "cuda", mesh.contiguous(), mesh_dim_names=("dp", "mp") ) ``` Without the change: exception. After this change: initialzied sucessfully. Differential Revision: D49942839 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110628 Approved by: https://github.com/wanchaol, https://github.com/xw285cornell, https://github.com/fduwjj

Author

yoyoyocmu

Committer

pytorchmergebot

Parents

a0bbd075

pytorch e3bf5000 - Hide the contiguous requirement for user input mesh when initializing DeviceMesh (#110628)

pytorch
e3bf5000 - Hide the contiguous requirement for user input mesh when initializing DeviceMesh (#110628)