Add reduce-scatter coalescence support for FSDP.
Also allow using reduce-scatter's scale param in FSDP.
(revived https://github.com/pytorch/xla/pull/4145)
Fix reduce-scatter-coalesce to be compatible with openxla reduce-scatter tuple change without token
Switch to GetOperandListWithToken naming for func GetOperandList
Add separate BuildReduceScatterCoalesced builder
Use token_handler.GetInput to consume the token
If bucket_size_mb is default 0, reduce-scatter every tensor rather than coalesce
Fix error checking in xm.reduce_scatter
Move FSDP changes to another PR