[BE][FSDP] Subtest prefetching in `test_fsdp_core.py` (#80908)
This moves the forward and backward prefetching to be subtests.
**On the AI AWS cluster, this reduces the `test_fsdp_core.py` TTS from ~2200 seconds (36 minutes) to 480 seconds (8 minutes).**
This introduces `run_subtests()` in `common_fsdp.py` and `_get_subtest_config()` in `test_fsdp_core.py`. Feel free to give suggestions for a cleaner way to do this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80908
Approved by: https://github.com/rohan-varma