[FSDP] Add `keep_low_precision_grads` support when CPU offloading (#86495)
When CPU offloading, FSDP uses `_cpu_grad`, not `_saved_grad_shard`. This adds support for `keep_low_precision_grads` for that case.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86495
Approved by: https://github.com/rohan-varma