support pure fp16 training in FSDP (#68417)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68417
1. since parameter attributes are lazily initialized at the beginning of forward, it makes more sense to init full_param_padded using parameters' data type during lazy_init, instead of using parameters' data type during construction, as parameters' data type may be changed after construction and before training loop
2.add checking whether parameter storage is changed outside FSDP and handle it properly
ghstack-source-id: 144479019
Test Plan: unit tests
Reviewed By: rohan-varma
Differential Revision: D32458643
fbshipit-source-id: 0e07e5e08270f2e265e8f49124a6648641e42e7a