DeepSpeed
Optimize the fp-dequantizer to get high memory-BW utilization
#5373
Merged

Loading