Allowing `float16` and `bfloat16` computation type and sharding groups in XLA FSDP (#3588) (#3617)
* allow specifying computation dtype other than float32 in XLA FSDP; also allow groups for sharding and grad clipping
* minor update and lint
* minor fix on model `extra_repr`
Co-authored-by: Ronghang Hu <ronghang.hu@gmail.com>