Add testing on A10G GPU to periodic workflow (#85524)
This enables testing on lots of modern CUDA features on sm_86 capable GPU
While migrating to that platform, discovered that `functorch` tests for `nn.functional.conv.transpose3d` produce garbage on sm_80+ as well as 2 `nvfuser` tests unexpectedly pass and one unexpectedly fails.
TODO:
- Investigate unexpected success for `test_vmapvjp_linalg_householder_product_cuda_float32` and add `functorch` shard
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85524
Approved by: https://github.com/ngimel