Fix misplaced overflow handling return in fused_optimizer.py (#7645)
This PR fixes an issue in deepspeed/runtime/fp16/fused_optimizer.py
where the gradient overflow handling logic incorrectly exited the
function too early, resulting in wrong forward pass and loss
calculations in certain FP16 training scenarios.
The `return self.overflow` and `self.timers.log(OVERFLOW_TIMERS)` calls
are now correctly moved inside the `if self.overflow:` block so that the
function only returns early when an actual overflow is detected.
Origin of the error:
https://github.com/deepspeedai/DeepSpeed/commit/889f0ead27435cd7755a73a9ba27e0913ab3c548
cc: @jithunnair-amd
Co-authored-by: Olatunji Ruwase <tjruwase@gmail.com>