Increase distributed shards (#84430)
Per title, increase from 2 to 3 shards.
With 2 shards, the test time was about 1.7 hours as show in [HUD](https://hud.pytorch.org/tts/pytorch/pytorch/master?jobName=pull%20%2F%20linux-bionic-cuda11.6-py3.10-gcc7%20%2F%20test%20(distributed%2C%201%2C%202%2C%20linux.8xlarge.nvidia.gpu))
With 3 shards, the time drops to about 1.1 hours:
* 1st shard: https://github.com/pytorch/pytorch/runs/8141516281 (1h16m)
* 2nd shard: https://github.com/pytorch/pytorch/runs/8141516449 (59m)
* 3rd shard: https://github.com/pytorch/pytorch/runs/8141516593 (1h3m)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84430
Approved by: https://github.com/clee2000