[FSDP] Sharded Grad Scaler (#76918)
Summary: Adding in a shard aware grad scaler for FSDP+MixedPrecision support
Test Plan: Tests added
Differential Revision: D35988676
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76918
Approved by: https://github.com/rohan-varma