Add ForSequenceClassification heads for the OLMo family (#45551)
* Add Olmo/Olmo2/Olmo3 ForSequenceClassification
Adds sequence-classification heads to the OLMo family so
`AutoModelForSequenceClassification.from_pretrained("allenai/OLMo-2-0425-1B")`
(and the Olmo/Olmo3 equivalents) work out of the box.
Implementation follows the canonical modular-inheritance pattern used by
Gemma/Gemma2, Qwen2/Qwen3, and Glm/Glm4: a single hand-written subclass in
`modular_olmo.py` cascades trivially to Olmo2 and Olmo3 via the modular
tooling, which resolves to the `GenericForSequenceClassification` mixin.
Also registers the three classes in `MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING_NAMES`
and adds autodoc entries to each model's doc page.
Coordination: https://github.com/huggingface/transformers/issues/45529
Maintainer approval: @Rocketknight1 ("This is welcome! ... happy for it to
be mostly AI-written. Just ping me on the PR for review when it's ready!")
AI assistance: yes, per issue #45529.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Add sequence-classification test coverage for Olmo family
For Olmo and Olmo2 (older `ModelTesterMixin` pattern), adds the new class
to `all_model_classes` and wires up `text-classification` + `zero-shot` in
`pipeline_model_mapping`, so standard forward/gradient tests run against
the classification head.
For Olmo3 (newer `CausalLMModelTester` pattern), sets
`sequence_classification_class = Olmo3ForSequenceClassification` on the
model tester, which auto-enables `test_sequence_classification_model`,
`test_sequence_classification_model_for_single_label`, and
`test_sequence_classification_model_for_multi_label` from the base class.
Local verification on MPS: 413 non-TP tests pass; Olmo3's three
classification tests pass specifically. TP tests (`test_tp_*`) are
deselected on MPS hardware — CUDA-only.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>