[FSDP2][DCP][DSD] Add test to ensure FSDP2 model/optim state_dict work after a full training loop (#120871)
This PR adds tests to test distributed state dict work properly for FSDP2's model and optimizer state_dict after a full training loop.
We test the combination of these options on a evenly sharded model.
```
{
"reshard_after_forward": [True, False],
"optimizer_class": [torch.optim.Adam],
"compile_model": [True, False],
},
```
Followup: 1. Add test for unevenly sharded model. 2. Add test to include `torch.optim.AdamW` (seems to have some gaps currently, still investigating)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120871
Approved by: https://github.com/fegin