DeepSpeed
45fce45c - Add deepseek autotp (#6937)

Commit
261 days ago
Add deepseek autotp (#6937) Deepseek including Multi-Head Latent Attention(MLA) and MoE. For MLA TP, we need to skip two low-rank layers("q_a_proj" and "kv_a_proj_with_mqa) For Deepseek MoE, tp_parse gets this moe layer name is layer_idx.down_proj, it is hard to add the policy, so we set the down_proj layer to all_reduce_linears default.
Author
Parents
Loading