Allowing `float16` and `bfloat16` computation type and sharding groups in XLA FSDP (#3588)
* allow specifying computation dtype other than float32 in XLA FSDP; also allow groups for sharding and grad clipping
* minor update and lint
* minor fix on model `extra_repr`