DeepSpeed
71d077da - Enable grad scaler for ZeRO-0 + torch.autocast path (#7619)

Commit

86 days ago

Enable grad scaler for ZeRO-0 + torch.autocast path (#7619) Currently, the DeepSpeed engine does not enable the grad scaler for the ZeRO-0 and `torch.autocast` path, even when dtype is set to `fp16`. This leads to errors in tests when we replace our hard-coded tolerances with PyTorch’s [standard tolerances](https://docs.pytorch.org/docs/stable/testing.html#torch.testing.assert_close) (Thank you @stas00 for you suggestion regarding the previous PR). This PR enables the grad scaler for this path to improve accuracy, and refactors the tests to simplify validation by using `torch.testing.assert_close`. The tests now rely on PyTorch’s standard (and stricter) tolerances, and they still pass. --------- Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com> Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>

References

#7619 - Enable grad scaler for ZeRO-0 + torch.autocast path

Author

tohtana

Parents

65322e10

DeepSpeed 71d077da - Enable grad scaler for ZeRO-0 + torch.autocast path (#7619)

DeepSpeed
71d077da - Enable grad scaler for ZeRO-0 + torch.autocast path (#7619)