DeepSpeed
505ffa67 - fix: skip empty parameters in gradient reduction

Commit
34 days ago
fix: skip empty parameters in gradient reduction Empty parameters (numel=0) cause issues in gradient allreduce when using flatten/unflatten operations. The unflatten operation fails with shape mismatches because empty tensors can't be properly reconstructed from a flattened buffer. This fix skips empty parameters in _get_gradients_for_reduction() since they contribute nothing to gradient reduction anyway. Fixes test_onebit.py::TestOneBitLambEmptyParameters::test Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Author
Committer
Parents
Loading