pytorch
ad4f51d7 - Shard rocm distributed tests

Commit
2 years ago
Shard rocm distributed tests As ROCm enabled more distributed tests on trunk, the test time has increased dramatically and is now past the 5hr timeout threshold. This is a mitigation! Note that distributed tests take 2.5 hrs ish on a linux machine, but they take over 5 hours now on ROCm. One can observe by going through the pages in https://hud.pytorch.org/hud/pytorch/pytorch/master/3?name_filter=rocm and seeing that after fdsp tests were enabled + some other changes, ROCm dist test time increased by 1hr+. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76536 Approved by: https://github.com/zengk95
Author
Committer
Parents
Loading