DeepSpeed
114f971c - Skip empty parameters in gradient reduction (#7789)

Commit
12 days ago
Skip empty parameters in gradient reduction (#7789) #7736 fixed an issue with OnebitLamb NaN propagation. With the fix, the optimizer correctly filters out empty parameters, but DeepSpeed engine's gradient allreduce operation (which runs separately from the optimizer) still includes empty parameters' gradients. This PR addresses the issue by skipping empty parameters (numel=0) in `_get_gradients_for_reduction()`. Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Author
Parents
Loading