Megatron-DeepSpeed
Checking we use fused kernels to compute scaled masked softmax on prefix lm
#213
Open

Loading