[BE][FSDP] Subtest prefetching in `test_mixed_precision_e2e_full_shard()` (#80915)
This moves the forward and backward prefetching to be subtests just like the previous PR. This only targets the single test `test_mixed_precision_e2e_full_shard()`. However, that test originally corresponded to 161 out of 170 tests from the `test_fsdp_mixed_precision.py` file, so this covers the bulk of the TTS contribution.
**On the AI AWS cluster, this reduces the `test_mixed_precision_e2e_full_shard()` TTS from ~1200 seconds (20 minutes) to 270 seconds (4.5 minutes).**
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80915
Approved by: https://github.com/rohan-varma