DeepSpeed
[BUG] Fix: Fix gradient norm calculation and dynamic shape blocking in PP+ZeRO1 collective communication
#7847
Open

Loading