DeepSpeed
fix num_kv_heads sharding in uneven autoTP for Falcon-40b
#4712
Merged

Loading