[FSDP] Do not include empty state in `_flatten_optim_state_dict()` (#88353)
https://github.com/pytorch/pytorch/blob/983c0e7f3101f1543bed6c4ec1539a4d590a94c0/torch/optim/adam.py#L163
The above line requires that a candidate optimizer state dict being loaded via `load_state_dict()` has non-empty state for its 0th parameter (via `state_values[0]`). This PR changes FSDP to only include non-empty mappings in the state returned by `_flatten_optim_state_dict()`, which is the subroutine for both `shard_full_optim_state_dict()` and `flatten_sharded_optim_state_dict()`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88353
Approved by: https://github.com/fegin