DeepSpeed
f3a9819c - Add EXAONE 4.0 model support for Inference V2 (#7853)

Commit
19 days ago
Add EXAONE 4.0 model support for Inference V2 (#7853) ## Summary Add support for LG AI Research's EXAONE 4.0 model family in DeepSpeed Inference V2. Closes #7453 ## Changes - New model implementation: `deepspeed/inference/v2/model_implementations/exaone4/` - `container.py`: Transformer and non-transformer parameter containers - `model.py`: Inference model with post-norm architecture and QK-Norm support - `policy.py`: Inference V2 policy - Register EXAONE 4.0 in `engine_factory.py` and `__init__.py` ## Key architectural differences from Mistral/Llama - **Post-norm**: RMSNorm is applied after attention/MLP outputs (not before), followed by residual addition - **QK-Norm**: Per-head RMSNorm applied to Q and K projections after the QKV linear layer - **Hybrid attention**: 32B model uses 3:1 sliding window/full attention ratio (via `layer_types` config) ## Supported models - [EXAONE-4.0-1.2B](https://huggingface.co/LGAI-EXAONE/EXAONE-4.0-1.2B) (all full attention) - [EXAONE-4.0-32B](https://huggingface.co/LGAI-EXAONE/EXAONE-4.0-32B) (hybrid sliding/full attention) Requires `transformers >= 4.54.0`. ## Related - Supersedes #7456 (draft, inactive for 6 months) --------- Signed-off-by: Bias92 <pewpewplay315@gmail.com>
Author
Parents
Loading