transformers
Add DeepSeek V2 Model into Transformers
#36400
Merged

Add DeepSeek V2 Model into Transformers #36400

VladOS95-cyber
VladOS95-cyber VladOS95-cyber force pushed from a43e9e3d to 1a4f2d45 311 days ago
VladOS95-cyber VladOS95-cyber force pushed from 1f95056a to 17ffdce1 302 days ago
Rocketknight1
VladOS95-cyber VladOS95-cyber force pushed from be7eabf5 to c7c1d28a 299 days ago
VladOS95-cyber VladOS95-cyber marked this pull request as ready for review 299 days ago
github-actions github-actions requested a review from ArthurZucker ArthurZucker 299 days ago
github-actions github-actions requested a review from Rocketknight1 Rocketknight1 299 days ago
VladOS95-cyber
Rocketknight1
VladOS95-cyber
VladOS95-cyber
VladOS95-cyber VladOS95-cyber force pushed from 5aeadaa7 to e2928965 297 days ago
VladOS95-cyber VladOS95-cyber force pushed from beac6a57 to 99d5d6c2 296 days ago
Rocketknight1
VladOS95-cyber VladOS95-cyber force pushed from 99d5d6c2 to a8ed8861 295 days ago
VladOS95-cyber
ArthurZucker
ArthurZucker commented on 2025-03-20
VladOS95-cyber VladOS95-cyber force pushed from 7dae9da8 to f9a98e46 283 days ago
VladOS95-cyber
VladOS95-cyber VladOS95-cyber requested a review from ArthurZucker ArthurZucker 283 days ago
ArthurZucker
VladOS95-cyber
VladOS95-cyber VladOS95-cyber force pushed from 5c019739 to 7ae44890 282 days ago
VladOS95-cyber VladOS95-cyber force pushed from 7ae44890 to 04522eee 281 days ago
VladOS95-cyber add initial structure
e690c32f
VladOS95-cyber doc fixes, add model base logic
4c4e6e43
VladOS95-cyber update init files
f544bbad
VladOS95-cyber some fixes to config and modular
841e47ac
VladOS95-cyber some improvements for attention
436551b1
VladOS95-cyber format
03e4b081
VladOS95-cyber remove unused attn
42c7a067
VladOS95-cyber some fixes for moe layer and for decoder
53c674af
VladOS95-cyber adapt _compute_yarn_parameters for deepseek
a6dc5bb6
VladOS95-cyber format
16a74e36
VladOS95-cyber small fix
2c6771ff
VladOS95-cyber fix for decoder forward
f19cd716
VladOS95-cyber add tests, small refactoring
4a1fecea
VladOS95-cyber fix dummies
221a8f41
VladOS95-cyber fix init
5899ceec
VladOS95-cyber fix doc
1ab69681
VladOS95-cyber fix config docs
a59cd271
VladOS95-cyber add sequce doc, fix init for gate
b8fe8aba
VladOS95-cyber fix issues in tests
a2bfdb4c
VladOS95-cyber fix config doc
a587a1f2
VladOS95-cyber remove unused args
a6eea985
VladOS95-cyber some fixes and refactoring after review
9d190553
VladOS95-cyber fix doc for config
faf0a096
VladOS95-cyber small fixes for config args
577165a5
VladOS95-cyber revert config refactoring
5028c49d
VladOS95-cyber small refactoring
073731a8
VladOS95-cyber minor fixes after rebase
65ae28aa
VladOS95-cyber VladOS95-cyber force pushed from 04522eee to 65ae28aa 280 days ago
VladOS95-cyber
ArthurZucker
VladOS95-cyber
ArthurZucker
VladOS95-cyber
Cyrilvallez
VladOS95-cyber Merge remote-tracking branch 'upstream/main' into add-deepseekv2
6b8b757d
VladOS95-cyber small fix after merge
f7ad7bf8
VladOS95-cyber
VladOS95-cyber fix modular
39d504a3
VladOS95-cyber remove rotaryembd from public init
832d548b
VladOS95-cyber small test fix
6ecbcb2f
VladOS95-cyber
VladOS95-cyber
ArthurZucker
ArthurZucker
ArthurZucker commented on 2025-05-26
VladOS95-cyber
VladOS95-cyber Merge remote-tracking branch 'upstream/main' into add-deepseekv2
168a5968
VladOS95-cyber some rotary pos calculation improvement
9af8f166
VladOS95-cyber
VladOS95-cyber fix format
22aea3e6
VladOS95-cyber VladOS95-cyber requested a review from ArthurZucker ArthurZucker 211 days ago
VladOS95-cyber Merge remote-tracking branch 'upstream/main' into add-deepseekv2
1ff51da2
Cyrilvallez
Cyrilvallez commented on 2025-06-16
VladOS95-cyber Merge remote-tracking branch 'upstream/main' into add-deepseekv2
e126455a
VladOS95-cyber some improvements and fixes
3a023fd7
VladOS95-cyber
VladOS95-cyber fix config
5f1ff0a8
VladOS95-cyber VladOS95-cyber requested a review from Cyrilvallez Cyrilvallez 198 days ago
VladOS95-cyber
Cyrilvallez
Cyrilvallez commented on 2025-06-25
VladOS95-cyber Merge remote-tracking branch 'upstream/main' into add-deepseekv2
834650bc
VladOS95-cyber some refactoring
8625b470
VladOS95-cyber
VladOS95-cyber VladOS95-cyber requested a review from Cyrilvallez Cyrilvallez 187 days ago
VladOS95-cyber adjust some unit tests
60f6c4e4
VladOS95-cyber skip test
335140bb
Cyrilvallez
Cyrilvallez approved these changes on 2025-07-02
Cyrilvallez Merge branch 'main' into add-deepseekv2
773b4a94
Cyrilvallez
Cyrilvallez
github-actions
VladOS95-cyber
Cyrilvallez
Cyrilvallez
VladOS95-cyber
VladOS95-cyber
Cyrilvallez
VladOS95-cyber Merge remote-tracking branch 'upstream/main' into add-deepseekv2
711a4455
VladOS95-cyber small fixes and tests adjustment
40e28e14
VladOS95-cyber Merge branch 'add-deepseekv2' of https://github.com/VladOS95-cyber/tr…
8f840578
github-actions
VladOS95-cyber
VladOS95-cyber VladOS95-cyber requested a review from Cyrilvallez Cyrilvallez 183 days ago
Cyrilvallez
github-actions
Cyrilvallez Merge branch 'main' into add-deepseekv2
47918fb9
github-actions
Cyrilvallez reapply modular
3fbb7e17
github-actions
Cyrilvallez
Cyrilvallez
github-actions
VladOS95-cyber
Cyrilvallez
VladOS95-cyber
Cyrilvallez fix all tests except Integration
59e1ddb8
github-actions
Cyrilvallez fix integration testzs
019efb51
github-actions
Cyrilvallez cleanup BC stuff
8e859354
Cyrilvallez
github-actions
github-actions
HuggingFaceDocBuilderDev
Cyrilvallez rope
89a2e717
github-actions
Cyrilvallez
github-actions
Cyrilvallez fix integrations tests based on a10
4733bf17
github-actions
Cyrilvallez style
797bf101
github-actions
Cyrilvallez
Cyrilvallez approved these changes on 2025-07-09
Cyrilvallez Cyrilvallez merged c9809042 into main 178 days ago
VladOS95-cyber
ArthurZucker ArthurZucker added New model
geetu040
VladOS95-cyber

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone