transformers
3ddc7139 - Add modular_esmc.py; generate modeling_esmc.py from it

Commit
1 day ago
Add modular_esmc.py; generate modeling_esmc.py from it ESMC now follows the modular convention. modular_esmc.py is the source of truth; modeling_esmc.py is generated by utils/modular_model_converter.py and carries the auto-generated header. Reuse from esm (the natural parent — also a bidirectional protein encoder): `eager_attention_forward`, `rotate_half`, and `apply_rotary_pos_emb` are now imported from ..esm.modeling_esm and inlined into the generated file with `# Copied from` headers (so they stay in sync). `rotate_half` is pulled in transitively as a dependency of `apply_rotary_pos_emb`, matching the qwen3 pattern. Everything else stays ESMC-specific and is defined in the modular file: the SAE-integrated ESMCModel + ForMaskedLM/SequenceClassification/ TokenClassification, the fused-LN MultiHeadAttention, SwiGLU FFN, TransformerStack, ESMCRotaryEmbedding, and the SAE-carrying output dataclasses. As expected for this architecture the dedup is modest; the win is convention compliance + auto-sync of the shared functions. The modular file was ruff-fixed/formatted (Optional[X] -> X | None, import order) before regeneration, so both files are now ruff-clean. Verified: `check_modular_conversion.py` passes (files in sync); `transformers` imports; and loading identical weights reproduces the pre-conversion last_hidden_state bit-for-bit (0.0) at all valid positions for plain, padding-mask, and multi-chain inputs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Author
Parents
Loading