DeepSpeed
b82ef716 - Improve error message and reduce validation in autocast test (#7547)

Commit

161 days ago

Improve error message and reduce validation in autocast test (#7547) This PR improves error logging and relaxes loss value checks in the autocast test. Previously, the test displayed error messages and mismatched loss values on all ranks, even if the mismatch only occurred on some ranks. This was confusing, since logs from other ranks could appear correct. This PR changes the behavior so that error messages are shown only on the ranks where the mismatch occurs. Additionally, this PR skips loss value validation for `test_lower_precision_model`, where we intentionally use a different communication dtype from the baseline (standard PyTorch autocast). --------- Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com> Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

References

#7547 - Improve error message and reduce validation in autocast test

Author

tohtana

Parents

08879a39

DeepSpeed b82ef716 - Improve error message and reduce validation in autocast test (#7547)

DeepSpeed
b82ef716 - Improve error message and reduce validation in autocast test (#7547)