pytorch
f58ba553 - [ROCm] Fix distributed tests failure and enable ROCm distributed CI (#92932)

Commit
2 years ago
[ROCm] Fix distributed tests failure and enable ROCm distributed CI (#92932) Distributed tests fails due to AttributeError: 'torch._C._distributed_c10d.ProcessGroup' object has no attribute '_set_backend' , when running distributed/test_c10d_spawn_gloo.py This leads to tests not progressing resulting in hang. Use _register_backend instead of _set_backend. Fixes https://github.com/pytorch/pytorch/pull/91632 More details of issue: https://github.com/pytorch/pytorch/pull/91632#issuecomment-1402831950 and https://github.com/pytorch/pytorch/pull/91632#issuecomment-1405646977 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92932 Approved by: https://github.com/jithunnair-amd, https://github.com/malfet, https://github.com/H-Huang
Author
Committer
Parents
Loading