Add `base_model_tp_plan` to OlmoeConfig
Enable tensor parallel loading for OLMoE models via `from_pretrained(tp_plan="auto")`.
Uses "colwise" for q_norm and k_norm (not "replicated_with_grad_allreduce")
because OLMoE applies these norms after the q/k projections, so the norm
weight dimensions must match the sharded projection output.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>