Add all-gather coalescing for FSDP/ZeRO1 (#5950)
* Add all-gather and reduce-scatter coalescence support for FSDP.
Also allow using reduce-scatter's scale param in FSDP.
(revived https://github.com/pytorch/xla/pull/4145)
* clang-format-7 and python lint fixes
* Fix "SyntaxError: 'return' outside function" error
* Code/test fixes to get run_tests.sh to run on CPU
* Fix allgather to be compatible with openxla allgather tuple change without token
* Fix reduce-scatter-coalesce to be compatible with openxla reduce-scatter tuple change without token
* Separate out the reduce-scatter-coalesce changes into a separate PR
* Some cleanups
* Add separate BuildAllGatherCoalesced builder and AllGatherCoalesced class
* Use token_handler.GetInput to capture token
* Clean up
* Clean up
* Switch to GetOperandListWithToken naming for func GetOperandList