transformers
c9809042 - Add DeepSeek V2 Model into Transformers (#36400)

Commit

209 days ago

Add DeepSeek V2 Model into Transformers (#36400) * add initial structure * doc fixes, add model base logic * update init files * some fixes to config and modular * some improvements for attention * format * remove unused attn * some fixes for moe layer and for decoder * adapt _compute_yarn_parameters for deepseek * format * small fix * fix for decoder forward * add tests, small refactoring * fix dummies * fix init * fix doc * fix config docs * add sequce doc, fix init for gate * fix issues in tests * fix config doc * remove unused args * some fixes and refactoring after review * fix doc for config * small fixes for config args * revert config refactoring * small refactoring * minor fixes after rebase * small fix after merge * fix modular * remove rotaryembd from public init * small test fix * some rotary pos calculation improvement * fix format * some improvements and fixes * fix config * some refactoring * adjust some unit tests * skip test * small fixes and tests adjustment * reapply modular * fix all tests except Integration * fix integration testzs * cleanup BC stuff * rope * fix integrations tests based on a10 * style --------- Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

References

#36400 - Add DeepSeek V2 Model into Transformers

Author

VladOS95-cyber

Parents

accbd8e0

transformers c9809042 - Add DeepSeek V2 Model into Transformers (#36400)

transformers
c9809042 - Add DeepSeek V2 Model into Transformers (#36400)