DeepSpeed
Reland perf fix for nan inf check
#7184
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
6
Changes
View On
GitHub
Reland perf fix for nan inf check
#7184
loadams
merged 6 commits into
deepspeedai:master
from
nelyahu:reland_perf_fix_for_nan_inf_check
nelyahu
requested a review
from
tjruwase
1 year ago
nelyahu
requested a review
from
tohtana
1 year ago
nelyahu
force pushed
from
6f123cc7
to
2166ff45
1 year ago
tjruwase
commented on 2025-03-30
Revert "Fix issue #5242 grad_norm and loss is nan (#7171)"
07cf3e21
replace boolen compute with torch.where for nan/inf detection
37a0454d
move logic to runtime.utils and reuse code
cddef4c7
nelyahu
force pushed
from
889c90d9
to
cddef4c7
1 year ago
tjruwase
approved these changes on 2025-03-31
fix formatting issues
5797162c
Merge branch 'master' into reland_perf_fix_for_nan_inf_check
efff1bf4
Merge branch 'master' into reland_perf_fix_for_nan_inf_check
4a003e01
loadams
merged
3c1817f3
into master
1 year ago
Login to write a write a comment.
Login via GitHub
Reviewers
tjruwase
tohtana
Assignees
No one assigned
Labels
None yet
Milestone
No milestone
Login to write a write a comment.
Login via GitHub