[ROCm] Limiting the NUM_PROCS to 8 while UT testing (#100133)
- Few AMD machines have >8 GPUs so limiting the NUM_PARALLEL_PROCS to 8, so number of test shards are also max 8
- Parallelizing for >8 is limited by memory.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100133
Approved by: https://github.com/jeffdaily, https://github.com/jithunnair-amd, https://github.com/malfet