shard `win-vs2019-cuda11.3-py3 / test` from 2 shards to 5 shards (#76867)
Fixes #ISSUE_NUMBER
shard `win-vs2019-cuda11.3-py3 / test` from 2 shards to 5 shards
helps w/ #76838
Notes:
- avg tts for the past week as of May 5 is 4.7 and 4.5 hours for 1st and 2nd shard on master, around 4 hours for all branches (but I don't think the changes from removing distributed tests + moving testing off of pull have come into effect yet)
- high overhead
- hope that tts doesn't explode
Sharding spreadsheet: https://docs.google.com/spreadsheets/d/1BdtVsjRr0Is9LXMNilR02FEdPXNq7zEWl8AmR3ArsLQ/edit#gid=1153012347
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76867
Approved by: https://github.com/suo, https://github.com/janeyx99