llama.cpp
Add support for DeepseekV2ForCausalLM
#7519
Merged

Add support for DeepseekV2ForCausalLM #7519

fairydreaming
sszymczy Added initial support for DeepseekV2ForCausalLM.
c8c353f8
fairydreaming Merge branch 'ggerganov:master' into deepseek-v2
b24c9ed5
sszymczy Removed unnecessary tensor operations.
03989640
sszymczy Added five new DeepSeek-V2-specific parameters:
b50c07c2
sszymczy Added initial support for DeepSeek-V2-Lite model.
79f84177
sszymczy Corrected mscale calculation.
60509416
sszymczy Added expert_weights_scale parameter for scaling MoE gate weights.
7e4786bb
sszymczy Temporarily hard-coded mscale value for DeepSeek-V2 (FIXME!).
71a74225
sszymczy Replaced hardcoded mscale value with rescaling attn_factor that resul…
f99df46f
sszymczy Whitespace formatting fixes.
3ae7235e
sszymczy Referenced the relevant GitHub discussion instead of providing long c…
68a51030
sszymczy Added YaRN log multiplier model header parameter corresponding to the…
7be56da9
sszymczy Added 16B and 236B model types for DeepSeek-V2.
842ff3fe
sszymczy Removed usage of output bias tensor since it's not present in DeepSee…
c033958d
sszymczy Merge remote-tracking branch 'upstream/master' into deepseek-v2
a54685b9
sszymczy gguf-py : re-add SCALING_YARN_LOG_MUL removed during merge by accident
bb9c3618
github-actions github-actions added python
github-actions
mofosyne mofosyne added model
mofosyne mofosyne added Review Complexity : Medium
sszymczy llama : correct llm_build_moe_ffn() arguments in build_arctic()
f3b5e7d4
ggerganov
ggerganov approved these changes on 2024-05-26
ggerganov ggerganov requested a review from slaren slaren 1 year ago
foldl
fairydreaming
sszymczy llama : code style corrections
abef8b26
sszymczy llama : rename n_expert_ff to n_ff_exp
a654cd99
foldl
sszymczy llama : rename qk_rope_head_dim, qk_nope_head_dim variables to n_embd…
5a3e6b6c
sszymczy llama : remove trailing whitespaces
20769c0f
sszymczy llama : rename moe_intermediate_size variable to n_ff_exp
fac1e804
sszymczy llama : rename n_leading_dense_layer to n_layer_dense_lead
56f70112
ggerganov
ggerganov commented on 2024-05-27
sszymczy llama : use attn_factor in mscale calculation to match the rope_yarn(…
82cec8b8
sszymczy llama : rename query_states, key_states, value_states to q_states, k_…
5cc7ec16
sszymczy llama : print DeekSeek-V2-specific parameters in llm_load_print_meta()
d02130d5
sszymczy convert-hf : fix flake8 Lint errors
bde971a9
sszymczy Merge remote-tracking branch 'upstream/master' into deepseek-v2
98ff6e1b
sszymczy llama : replace ggml_new_tensor_3d + ggml_set_inplace + ggml_set_inpl…
841cd474
sszymczy gguf-py, llama : whitespace formatting fixes
3efb6595
slaren
slaren approved these changes on 2024-05-28
fairydreaming fairydreaming merged ee3dff6b into master 1 year ago
bartowski1182
trislee02
ggerganov
ggerganov
fairydreaming
ggerganov
fairydreaming
fairydreaming
ggerganov
ggerganov
ggerganov
ggerganov commented on 2024-05-29
ggerganov
fairydreaming
oldgithubman

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone