DeepSpeed
b4e74a91 - Fix OnebitLamb NaN propagation with empty parameters (#7736)

Commit
77 days ago
Fix OnebitLamb NaN propagation with empty parameters (#7736) Fixed a critical issue where OnebitLamb optimizer would produce NaNs when optimizing models with empty parameters (numel=0). The scaling factor calculation involved division by sqrt(numel), which resulted in 0.0/0.0 -> NaN for empty parameters. This NaN value propagated to the global scaling coefficient, corrupting the state of all other parameters. Changed the denominator to use max(numel, 1) or conditional 1.0 to ensure safe division. --------- Signed-off-by: Rakshit-gen <sisodiarakshit456@gmail.com> Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>
Author
Parents
Loading