Add Apertus (#39381) - SemanticDiff

Commit

149 days ago

Add Apertus (#39381) * init swissai model * AutoModelForCausalLM * AutoModelForCausalLM mapping * qk norm and post ln optional * fix wrong shape of qk norm: megatron uses head_dim * automodel fixes * minor fix in forward * fix rope validation to accept llama3 scaling * `SwissAIForTokenClassification` support * Align `SwissAI` to v4.52.4 * Align `SwissAI` to v4.53.1 * Init CUDA xIELU * `SwissAI*`->`Apertus*` * ci fix * check_docstring ignore ApertusConfig * Licensing and placeholder tests * Placeholder doc * XIELU syntax * `_xielu_python` optimization * Fix xIELU * [tmp] `{beta,eps}` persistent=False until {beta,eps} saved in checkpoint * Modular `Apertus` * CUDA xIELU logging * ci fix * ci fix * ci fix * Update license Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * Update tests/models/apertus/test_modeling_apertus.py Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * `.utils.import_utils.is_torchdynamo_compiling` * `Apertus` class ordering * `past_key_value{->s}`, `make fix-copies` * ci fix * Remove unused configuration parameters * `{beta,eps}` saved in checkpoint * `{beta,eps}` Temporarily on CPU * Suggestions Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * ci fix * remove fx_compatible (deprecated) * remove `rotary_embedding_layer` As the tests are written for a config without default scaling (which is not the case in Apertus) - besides, rope scaling is tested in other models so it's all safe. * fully removing `Mask4DTestHard` class Not needed (for now) * switch to `dtype` instead of `torch_dtype` Following this: https://github.com/huggingface/transformers/pull/39782 * remove unused imports * remove `cache_implementation="static"` * +Apertus to `docs/source/en/_toctree.yml` for the doc builder --------- Co-authored-by: Alexander Hagele <alexanderhagele@gmail.com> Co-authored-by: dhia680 <garbayad@gmail.com> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> Co-authored-by: Dhia Garbaya <84809366+dhia680@users.noreply.github.com>

References

#39381 - Add Apertus

Author

EduardDurech

Parents

f9b9a5e8

transformers d10603f7 - Add Apertus (#39381)

transformers
d10603f7 - Add Apertus (#39381)