DeepSpeed
4a3234e0 - ZeRO-2: Handle gradients of empty partitions (#275)

Commit
5 years ago
ZeRO-2: Handle gradients of empty partitions (#275) * Load non-DeepSpeed checkpoints into ZeRO optimizer * Handle parameters smaller than DP * Formatting fixes * Handle empty partitions * Fix perf bug Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Author
Parents
Loading