Add `base_model_tp_plan` to OlmoeConfig

Commit

46 days ago

Add `base_model_tp_plan` to OlmoeConfig Enable tensor parallel loading for OLMoE models via `from_pretrained(tp_plan="auto")`. Uses "colwise" for q_norm and k_norm (not "replicated_with_grad_allreduce") because OLMoE applies these norms after the q/k projections, so the norm weight dimensions must match the sharded projection output. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>