[FSDP] Relax `sharded_grad` assert to allow IDLE (#96584)
`_use_sharded_grad_views()` can be called when re-registering the original parameters in `load_state_dict()`, in which case the training state is `IDLE`. Previously, I only expected `_use_sharded_grad_views()` to be called in `FORWARD` when the sharded gradient is not in `_saved_grad_shard` or `_cpu_grad`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96584
Approved by: https://github.com/fegin, https://github.com/zhaojuanmao