Add deepseek autotp (#6937)
Deepseek including Multi-Head Latent Attention(MLA) and MoE.
For MLA TP, we need to skip two low-rank layers("q_a_proj" and
"kv_a_proj_with_mqa)
For Deepseek MoE, tp_parse gets this moe layer name is
layer_idx.down_proj, it is hard to add the policy, so we set the
down_proj layer to all_reduce_linears default.