Megatron-DeepSpeed
Checking we use fused kernels to compute scaled masked softmax on prefix lm
#209
Merged

Loading