DeepSpeed
Add conditional on torch version for scaled_dot_product_attention
#6517
Merged

Loading