Fix rocm sharding (#102871)
Rocm queries for the number of processes it should use per machine, which might cause it be different across shards, which leads to inconsistencies when distributing tests among shards.
My solution is to separate the vars used for shard calculations and the actual number of procs that can be used and to ensure that the var used for shard calculations is consistent across all shards for a test config + job. I believe that the only consequence is that rocm sharding might become unbalanced.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102871
Approved by: https://github.com/huydhn, https://github.com/malfet