transformers
[WIP] add deepseek-v3
#35926
Merged

[WIP] add deepseek-v3 #35926

bzantium
ArthurZucker init commit
b926c3da
ArthurZucker Merge branch 'main' of github.com:huggingface/transformers into updat…
c62c5b76
ArthurZucker style
5b850236
ArthurZucker take comments into account
3b76bdae
bzantium add deepseekv3 modeling
704767e0
bzantium Merge branch 'main' into feature/#35425
737ee3af
bzantium Merge branch 'main' of https://github.com/bzantium/transformers into …
fc3a4c7a
bzantium remove redundant code
244e793c
bzantium Merge branch 'feature/#35425' of https://github.com/bzantium/transfor…
0968df54
bzantium apply make style
4fb2a80b
bzantium apply fix-copies
6b002e5e
bzantium make format
4ec1e887
bzantium add init files
114ab84c
Rocketknight1
bzantium rename deepseekv3 into deepseek_v3 based on its model_type
779f8d2a
bzantium rename deepseekv3 into deepseek_v3 based on its model_type
22623a39
bzantium deepseek-v3 not deepseek_v3
78b19b05
bzantium set model_type as deepseek_v3
eb0e3a4e
bzantium use default docs
57088cc5
bzantium apply make
0ef561b9
bzantium fill type and docstring
9a75a56a
bzantium bzantium changed the title [WIP] add deepseekv3 [WIP] add deepseek-v3 324 days ago
ruidazeng
ruidazeng approved these changes on 2025-01-29
bzantium add rope_config_validation
cdf83e45
bzantium use custom DeepseekV3MLP
51990b94
ArthurZucker
ArthurZucker commented on 2025-01-29
cuichenx
ArthurZucker
casper-hansen
bzantium hold code only for checkpoints congifuration; remove redundant
f4f0ebd8
bzantium revise rope yarn for DeepSeek variation
4b72b30b
bzantium
bzantium
bzantium
bzantium Merge branch 'main' into feature/#35425
96562c41
bzantium rename DeepSeek-V3
6792cb52
casper-hansen
bzantium
ArthurZucker
ArthurZucker commented on 2025-01-30
ArthurZucker
bzantium
ArthurZucker some refactoring
3bf3b323
bzantium revise load_hook to work properly; make moe func trainable; use llama…
24bc8b2c
bzantium
bzantium fix attention forward
5c0cd917
ArthurZucker
bzantium
bzantium use -1 for not-changing dim when to use exapnd
8e994dd8
bzantium refactor DeepseekV3TopkRouter
7405a95f
bzantium
bzantium use reshape_for_rope instead of load_hook; revise attention forward f…
ea3c9225
ArthurZucker
bzantium register pre_hook and hook both
c8132687
bzantium make style
4ab2f9e8
bzantium
ArthurZucker
bzantium
ArthurZucker
mseeger
mseeger
ArthurZucker
bzantium
ArthurZucker
mseeger
mseeger
mseeger
mseeger
bzantium
mseeger
mseeger
mseeger
bzantium
bzantium use n_shared_experts
c5429ec7
mseeger
SunMarc
mseeger
mseeger
bzantium
ruidazeng
ruidazeng approved these changes on 2025-02-13
mseeger
ArthurZucker
bzantium
ArthurZucker
ArthurZucker commented on 2025-02-13
bzantium Update src/transformers/models/deepseek_v3/configuration_deepseek_v3.py
4df42f0b
Merge branch 'main' of github.com:huggingface/transformers into featu…
e0a49ac8
Merge branch 'feature/#35425' of github.com:bzantium/transformers int…
dfd9abc2
ArthurZucker
ArthurZucker
mseeger
ArthurZucker
bzantium add test file
ba21b7c1
bzantium Merge branch 'feature/#35425' of https://github.com/bzantium/transfor…
22701731
bzantium update modeling_file according to modular file
b5f420b7
bzantium make style
6bd75a9a
bzantium add mapping for DeepseekV3ForSequenceClassification
6ccbc663
bzantium remove aux_loss_alpha
a1c62743
bzantium add deepseek_v3 for perf
a80462b3
bzantium add deepseek_v3
dd78f48c
bzantium rename test as deepseekv3
54481ef5
bzantium use tiny-deepseek-v3
e0f1c2dd
bzantium Merge branch 'main' into feature/#35425
23fb7569
bzantium remove DeepseekV3ForSequenceClassification
52147415
bzantium cache before padding
67f1f0ca
bzantium
ArthurZucker
ArthurZucker
bzantium
mseeger
mseeger
bzantium remote output_router_logits
f264f800
bzantium Revert "remote output_router_logits"
d4c6a1bd
bzantium remove output_router_logits
c7c8d766
bzantium Merge branch 'main' into feature/#35425
0b5ff07e
bzantium make e_score_correction_bias as buffer
ba6f7d40
bzantium skip tests not compatible
d7931b32
bzantium make style
92bd99cb
bzantium
bzantium make e_score_correction_bias as buffer
7d81efea
ArthurZucker
ArthurZucker
mseeger
mseeger commented on 2025-02-18
mseeger
mseeger commented on 2025-02-18
mseeger
mseeger commented on 2025-02-18
mseeger
mseeger commented on 2025-02-18
mseeger
mseeger commented on 2025-02-18
bzantium
bzantium use rope_interleave instead of load_hook
b33fdb5b
bzantium skip tests not compatible with MLA
7f859f8d
bzantium add doc for rope_interleave
397ecf3c
bzantium fix typo
2628438a
bzantium remove torch.no_grad for selecting topk
af3d3285
bzantium
mseeger
mseeger requested changes on 2025-02-19
mseeger
mseeger requested changes on 2025-02-19
mseeger
ArthurZucker
casper-hansen
jianguoz
mseeger
mseeger
jianguoz
ArthurZucker
bzantium
ArthurZucker
agokrani
mseeger
kylesayrs
ArthurZucker Merge branch 'main' of github.com:huggingface/transformers into featu…
f0357f9e
ArthurZucker fix post merge issue
14e7d4e9
ArthurZucker Merge branch 'main' of github.com:huggingface/transformers into updat…
5c854904
ArthurZucker mrege with main and simplify
1d8516d5
ArthurZucker nits
9b4f4333
ArthurZucker final
abffdfeb
ArthurZucker small fixes
6c7eaa51
ArthurZucker Merge branch 'main' of github.com:huggingface/transformers into updat…
71d47f48
ArthurZucker fix
9e4965ad
ArthurZucker support TP better
6bb8802d
ArthurZucker stash
426d9413
ArthurZucker Merge branch 'update-from-pretrained' of github.com:huggingface/trans…
d4d60c37
ArthurZucker changes currently requires
f0a83891
ArthurZucker remove synch
4b8a8578
ArthurZucker
ArthurZucker more fixes for TP
eedbf599
Neo9061
ArthurZucker
ArthurZucker temp fix for TP : some attention layers's FP8 scales are too small + …
409f3412
ArthurZucker
ArthurZucker updates to have generation work!
3fb9bea5
ArthurZucker
ArthurZucker push most of the changes
7350a5d4
ArthurZucker reorder functions + call for contributions!
a50c3512
ArthurZucker update readme
24557c37
ArthurZucker nits
d7da38bc
ArthurZucker update
186e32b9
ArthurZucker Merge branch 'main' of github.com:huggingface/transformers into featu…
c198b4b6
ArthurZucker ruff was updated on main
ee33cf7b
ArthurZucker merge with main and fix copies
f2bb6f98
ArthurZucker revert unrelated changes
8cefd1c9
ArthurZucker
ArthurZucker route all tokens to all experts when testing to avoid no gradient iddues
a8fff20e
ArthurZucker finish fixing all tests
13019a7f
ArthurZucker ArthurZucker added New model
ArthurZucker fixup
9b310a16
ArthurZucker nit
e3628a3e
ArthurZucker
ArthurZucker clean config
9eb38e6d
ArthurZucker last readme changes
8cb959b8
ArthurZucker nit
a55630b6
ArthurZucker do cnit
bce20738
ArthurZucker typo
a1f1f3fc
ArthurZucker last nit
d2ae0720
ArthurZucker one more one more
372efd65
bzantium
ArthurZucker
bzantium
bzantium commented on 2025-03-28
ArthurZucker ArthurZucker merged eca74d13 into main 266 days ago
ArthurZucker
bzantium
Neo9061
ArthurZucker
ArthurZucker
bzantium

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone