DeepSpeed
add bf16 cuda kernel support
#3092
Merged

Loading