Shard rocm distributed tests
As ROCm enabled more distributed tests on trunk, the test time has increased dramatically and is now past the 5hr timeout threshold.
This is a mitigation! Note that distributed tests take 2.5 hrs ish on a linux machine, but they take over 5 hours now on ROCm.
One can observe by going through the pages in https://hud.pytorch.org/hud/pytorch/pytorch/master/3?name_filter=rocm and seeing that after fdsp tests were enabled + some other changes, ROCm dist test time increased by 1hr+.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76536
Approved by: https://github.com/zengk95