Fix OnebitLamb NaN propagation with empty parameters (#7736)
Fixed a critical issue where OnebitLamb optimizer would produce NaNs
when optimizing models with empty parameters (numel=0).
The scaling factor calculation involved division by sqrt(numel), which
resulted in 0.0/0.0 -> NaN for empty parameters. This NaN value
propagated to the global scaling coefficient, corrupting the state of
all other parameters.
Changed the denominator to use max(numel, 1) or conditional 1.0 to
ensure safe division.
---------
Signed-off-by: Rakshit-gen <sisodiarakshit456@gmail.com>
Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>