reduce gradient first for XLA when unscaling the gradients in mixed precision training with AMP. (#1926)
* reduce gradient first for XLA when unscaling the gradients in mixed
precision training with AMP.
* Apply suggestions from code review
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
* update acceleartor.reduce and accelerate.utils.operations.reduce
---------
Co-authored-by: Zach Mueller <muellerzr@gmail.com>