DeepSpeed
Reland perf fix for nan inf check
#7184
Merged

Reland perf fix for nan inf check #7184

nelyahu
nelyahu nelyahu requested a review from tjruwase tjruwase 1 year ago
nelyahu nelyahu requested a review from tohtana tohtana 1 year ago
nelyahu nelyahu force pushed from 6f123cc7 to 2166ff45 1 year ago
nelyahu
tjruwase
tjruwase commented on 2025-03-30
Revert "Fix issue #5242 grad_norm and loss is nan (#7171)"
07cf3e21
replace boolen compute with torch.where for nan/inf detection
37a0454d
move logic to runtime.utils and reuse code
cddef4c7
nelyahu nelyahu force pushed from 889c90d9 to cddef4c7 1 year ago
tjruwase
tjruwase approved these changes on 2025-03-31
loadams
fix formatting issues
5797162c
loadams Merge branch 'master' into reland_perf_fix_for_nan_inf_check
efff1bf4
hwchen2017 Merge branch 'master' into reland_perf_fix_for_nan_inf_check
4a003e01
loadams loadams merged 3c1817f3 into master 1 year ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone