DeepSpeed
d56e847b - Fix misplaced overflow handling return in fused_optimizer.py (#7645)

Commit

116 days ago

Fix misplaced overflow handling return in fused_optimizer.py (#7645) This PR fixes an issue in deepspeed/runtime/fp16/fused_optimizer.py where the gradient overflow handling logic incorrectly exited the function too early, resulting in wrong forward pass and loss calculations in certain FP16 training scenarios. The `return self.overflow` and `self.timers.log(OVERFLOW_TIMERS)` calls are now correctly moved inside the `if self.overflow:` block so that the function only returns early when an actual overflow is detected. Origin of the error: https://github.com/deepspeedai/DeepSpeed/commit/889f0ead27435cd7755a73a9ba27e0913ab3c548 cc: @jithunnair-amd Co-authored-by: Olatunji Ruwase <tjruwase@gmail.com>

References

#7645 - Fix misplaced overflow handling return in fused_optimizer.py

Author

rraminen

Parents

02da3732

DeepSpeed d56e847b - Fix misplaced overflow handling return in fused_optimizer.py (#7645)

DeepSpeed
d56e847b - Fix misplaced overflow handling return in fused_optimizer.py (#7645)