pytorch
a5886487 - [DCP] Fix 'torch.cpu' has no attribute 'current_device' in checkpoint/optimizer.py (#110299)

Commit

1 year ago

[DCP] Fix 'torch.cpu' has no attribute 'current_device' in checkpoint/optimizer.py (#110299) When running on "gloo" and "cpu:gloo,cuda:nccl" backend, it will run into the following error. ``` -- Process 1 terminated with the following error: Traceback (most recent call last): File "/data/users/irisz/pytorch/torch/multiprocessing/spawn.py", line 74, in _wrap fn(i, *args) File "/data/users/irisz/pytorch/torch/distributed/checkpoint/examples/fsdp_checkpoint_example.py", line 105, in run_fsdp_checkpoint_example optim_state = load_sharded_optimizer_state_dict( File "/data/users/irisz/pytorch/torch/distributed/checkpoint/optimizer.py", line 295, in load_sharded_optimizer_state_dict _alloc_tensor(value.properties, value.size, dp_pg_device_type), sharding_spec File "/data/users/irisz/pytorch/torch/distributed/checkpoint/optimizer.py", line 109, in _alloc_tensor device=cast(torch.device, _get_device_module(device_type).current_device()), AttributeError: module 'torch.cpu' has no attribute 'current_device' ``` This PR fix the error in optimizer.py. Will follow up to add "cpu:gloo,cuda:nccl" support in DTensorBase so we can update unit test to include this backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110299 Approved by: https://github.com/kumpera

Author

wz337

Committer

pytorchmergebot

Parents

13af952f

pytorch a5886487 - [DCP] Fix 'torch.cpu' has no attribute 'current_device' in checkpoint/optimizer.py (#110299)

pytorch
a5886487 - [DCP] Fix 'torch.cpu' has no attribute 'current_device' in checkpoint/optimizer.py (#110299)