transformers
[WIP] add deepseek-v3
#35926
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
101
Changes
View On
GitHub
[WIP] add deepseek-v3
#35926
ArthurZucker
merged 101 commits into
huggingface:main
from
bzantium:feature/#35425
init commit
b926c3da
Merge branch 'main' of github.com:huggingface/transformers into updat…
c62c5b76
style
5b850236
take comments into account
3b76bdae
add deepseekv3 modeling
704767e0
Merge branch 'main' into feature/#35425
737ee3af
Merge branch 'main' of https://github.com/bzantium/transformers into …
fc3a4c7a
remove redundant code
244e793c
Merge branch 'feature/#35425' of https://github.com/bzantium/transfor…
0968df54
apply make style
4fb2a80b
apply fix-copies
6b002e5e
make format
4ec1e887
add init files
114ab84c
rename deepseekv3 into deepseek_v3 based on its model_type
779f8d2a
rename deepseekv3 into deepseek_v3 based on its model_type
22623a39
deepseek-v3 not deepseek_v3
78b19b05
set model_type as deepseek_v3
eb0e3a4e
use default docs
57088cc5
apply make
0ef561b9
fill type and docstring
9a75a56a
bzantium
changed the title
[WIP] add deepseekv3
[WIP] add deepseek-v3
324 days ago
ruidazeng
approved these changes on 2025-01-29
add rope_config_validation
cdf83e45
use custom DeepseekV3MLP
51990b94
ArthurZucker
commented on 2025-01-29
hold code only for checkpoints congifuration; remove redundant
f4f0ebd8
revise rope yarn for DeepSeek variation
4b72b30b
Merge branch 'main' into feature/#35425
96562c41
rename DeepSeek-V3
6792cb52
ArthurZucker
commented on 2025-01-30
some refactoring
3bf3b323
revise load_hook to work properly; make moe func trainable; use llama…
24bc8b2c
fix attention forward
5c0cd917
use -1 for not-changing dim when to use exapnd
8e994dd8
refactor DeepseekV3TopkRouter
7405a95f
use reshape_for_rope instead of load_hook; revise attention forward f…
ea3c9225
register pre_hook and hook both
c8132687
make style
4ab2f9e8
use n_shared_experts
c5429ec7
ruidazeng
approved these changes on 2025-02-13
ArthurZucker
commented on 2025-02-13
Update src/transformers/models/deepseek_v3/configuration_deepseek_v3.py
4df42f0b
Merge branch 'main' of github.com:huggingface/transformers into featu…
e0a49ac8
Merge branch 'feature/#35425' of github.com:bzantium/transformers int…
dfd9abc2
add test file
ba21b7c1
Merge branch 'feature/#35425' of https://github.com/bzantium/transfor…
22701731
update modeling_file according to modular file
b5f420b7
make style
6bd75a9a
add mapping for DeepseekV3ForSequenceClassification
6ccbc663
remove aux_loss_alpha
a1c62743
add deepseek_v3 for perf
a80462b3
add deepseek_v3
dd78f48c
rename test as deepseekv3
54481ef5
use tiny-deepseek-v3
e0f1c2dd
Merge branch 'main' into feature/#35425
23fb7569
remove DeepseekV3ForSequenceClassification
52147415
cache before padding
67f1f0ca
remote output_router_logits
f264f800
Revert "remote output_router_logits"
d4c6a1bd
remove output_router_logits
c7c8d766
Merge branch 'main' into feature/#35425
0b5ff07e
make e_score_correction_bias as buffer
ba6f7d40
skip tests not compatible
d7931b32
make style
92bd99cb
make e_score_correction_bias as buffer
7d81efea
mseeger
commented on 2025-02-18
mseeger
commented on 2025-02-18
mseeger
commented on 2025-02-18
mseeger
commented on 2025-02-18
mseeger
commented on 2025-02-18
use rope_interleave instead of load_hook
b33fdb5b
skip tests not compatible with MLA
7f859f8d
add doc for rope_interleave
397ecf3c
fix typo
2628438a
remove torch.no_grad for selecting topk
af3d3285
mseeger
requested changes on 2025-02-19
mseeger
requested changes on 2025-02-19
Merge branch 'main' of github.com:huggingface/transformers into featu…
f0357f9e
fix post merge issue
14e7d4e9
Merge branch 'main' of github.com:huggingface/transformers into updat…
5c854904
mrege with main and simplify
1d8516d5
nits
9b4f4333
final
abffdfeb
small fixes
6c7eaa51
Merge branch 'main' of github.com:huggingface/transformers into updat…
71d47f48
fix
9e4965ad
support TP better
6bb8802d
stash
426d9413
Merge branch 'update-from-pretrained' of github.com:huggingface/trans…
d4d60c37
changes currently requires
f0a83891
remove synch
4b8a8578
more fixes for TP
eedbf599
temp fix for TP : some attention layers's FP8 scales are too small + …
409f3412
updates to have generation work!
3fb9bea5
push most of the changes
7350a5d4
reorder functions + call for contributions!
a50c3512
update readme
24557c37
nits
d7da38bc
update
186e32b9
Merge branch 'main' of github.com:huggingface/transformers into featu…
c198b4b6
ruff was updated on main
ee33cf7b
merge with main and fix copies
f2bb6f98
revert unrelated changes
8cefd1c9
route all tokens to all experts when testing to avoid no gradient iddues
a8fff20e
finish fixing all tests
13019a7f
ArthurZucker
added
New model
fixup
9b310a16
nit
e3628a3e
clean config
9eb38e6d
last readme changes
8cb959b8
nit
a55630b6
do cnit
bce20738
typo
a1f1f3fc
last nit
d2ae0720
one more one more
372efd65
bzantium
commented on 2025-03-28
ArthurZucker
merged
eca74d13
into main
266 days ago
Login to write a write a comment.
Login via GitHub
Reviewers
ruidazeng
mseeger
ArthurZucker
Assignees
No one assigned
Labels
New model
Milestone
No milestone
Login to write a write a comment.
Login via GitHub