pytorch
8b7bd819 - determined collective device by _get_pg_default_device rather than explicit cuda (#101533)

Commit

1 year ago

determined collective device by _get_pg_default_device rather than explicit cuda (#101533) There are many communication operations for shardedTensor in the state dict of fsdp. They use the external passed-in pg (or the default pg), which currently supports cuda devices. Before communication, the memory will be moved to cuda, which is implicit (because it is essentially moving data to the memory type required by pg, not the computing device type). Similarly, when users use fsdp on a custom backend, they will pass in a custom pg (which does not support cuda devices), which may cause fsdp to not work properly in some cases. This PR obtains the memory type supported by the pg through _get_pg_default_device during communication, and moves the data to it when needed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101533 Approved by: https://github.com/awgu

Author

medivh-xp

Committer

pytorchmergebot

Parents

fd1d4421

pytorch 8b7bd819 - determined collective device by _get_pg_default_device rather than explicit cuda (#101533)

pytorch
8b7bd819 - determined collective device by _get_pg_default_device rather than explicit cuda (#101533)