Add DeepSeek V2 Model into Transformers #36400
VladOS95-cyber
marked this pull request as ready for review 299 days ago
add initial structure
e690c32f
doc fixes, add model base logic
4c4e6e43
update init files
f544bbad
some fixes to config and modular
841e47ac
some improvements for attention
436551b1
format
03e4b081
remove unused attn
42c7a067
some fixes for moe layer and for decoder
53c674af
adapt _compute_yarn_parameters for deepseek
a6dc5bb6
format
16a74e36
small fix
2c6771ff
fix for decoder forward
f19cd716
add tests, small refactoring
4a1fecea
fix dummies
221a8f41
fix init
5899ceec
fix doc
1ab69681
fix config docs
a59cd271
add sequce doc, fix init for gate
b8fe8aba
fix issues in tests
a2bfdb4c
fix config doc
a587a1f2
remove unused args
a6eea985
some fixes and refactoring after review
9d190553
fix doc for config
faf0a096
small fixes for config args
577165a5
revert config refactoring
5028c49d
small refactoring
073731a8
minor fixes after rebase
65ae28aa
Merge remote-tracking branch 'upstream/main' into add-deepseekv2
6b8b757d
small fix after merge
f7ad7bf8
fix modular
39d504a3
remove rotaryembd from public init
832d548b
small test fix
6ecbcb2f
Merge remote-tracking branch 'upstream/main' into add-deepseekv2
168a5968
some rotary pos calculation improvement
9af8f166
fix format
22aea3e6
Merge remote-tracking branch 'upstream/main' into add-deepseekv2
1ff51da2
Merge remote-tracking branch 'upstream/main' into add-deepseekv2
e126455a
some improvements and fixes
3a023fd7
fix config
5f1ff0a8
Merge remote-tracking branch 'upstream/main' into add-deepseekv2
834650bc
some refactoring
8625b470
adjust some unit tests
60f6c4e4
skip test
335140bb
Merge branch 'main' into add-deepseekv2
773b4a94
Merge remote-tracking branch 'upstream/main' into add-deepseekv2
711a4455
small fixes and tests adjustment
40e28e14
Merge branch 'add-deepseekv2' of https://github.com/VladOS95-cyber/tr…
8f840578
Merge branch 'main' into add-deepseekv2
47918fb9
reapply modular
3fbb7e17
fix all tests except Integration
59e1ddb8
fix integration testzs
019efb51
cleanup BC stuff
8e859354
rope
89a2e717
fix integrations tests based on a10
4733bf17
style
797bf101
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub