PR #7519 Add support for DeepseekV2ForCausalLM

Add support for DeepseekV2ForCausalLM #7519

fairydreaming merged 30 commits into ggml-org:master from fairydreaming:deepseek-v2

Added initial support for DeepseekV2ForCausalLM.

c8c353f8

Merge branch 'ggerganov:master' into deepseek-v2

b24c9ed5

Removed unnecessary tensor operations.

03989640

Added five new DeepSeek-V2-specific parameters:

b50c07c2

Added initial support for DeepSeek-V2-Lite model.

79f84177

Corrected mscale calculation.

60509416

Added expert_weights_scale parameter for scaling MoE gate weights.

7e4786bb

Temporarily hard-coded mscale value for DeepSeek-V2 (FIXME!).

71a74225

Replaced hardcoded mscale value with rescaling attn_factor that resul…

f99df46f

Whitespace formatting fixes.

3ae7235e

Referenced the relevant GitHub discussion instead of providing long c…

68a51030

Added YaRN log multiplier model header parameter corresponding to the…

7be56da9

Added 16B and 236B model types for DeepSeek-V2.

842ff3fe

Removed usage of output bias tensor since it's not present in DeepSee…

c033958d

Merge remote-tracking branch 'upstream/master' into deepseek-v2

a54685b9

gguf-py : re-add SCALING_YARN_LOG_MUL removed during merge by accident

bb9c3618

github-actions added python

mofosyne added model

mofosyne added Review Complexity : Medium

llama : correct llm_build_moe_ffn() arguments in build_arctic()

f3b5e7d4

ggerganov approved these changes on 2024-05-26

ggerganov requested a review from

slaren 2 years ago

llama : code style corrections

abef8b26

llama : rename n_expert_ff to n_ff_exp

a654cd99

llama : rename qk_rope_head_dim, qk_nope_head_dim variables to n_embd…

5a3e6b6c

llama : remove trailing whitespaces

20769c0f

llama : rename moe_intermediate_size variable to n_ff_exp

fac1e804

llama : rename n_leading_dense_layer to n_layer_dense_lead

56f70112

ggerganov commented on 2024-05-27

llama : use attn_factor in mscale calculation to match the rope_yarn(…

82cec8b8

llama : rename query_states, key_states, value_states to q_states, k_…

5cc7ec16

llama : print DeekSeek-V2-specific parameters in llm_load_print_meta()

d02130d5

convert-hf : fix flake8 Lint errors

bde971a9

Merge remote-tracking branch 'upstream/master' into deepseek-v2

98ff6e1b

llama : replace ggml_new_tensor_3d + ggml_set_inplace + ggml_set_inpl…

841cd474

gguf-py, llama : whitespace formatting fixes

3efb6595

slaren approved these changes on 2024-05-28

fairydreaming merged ee3dff6b into master 2 years ago

ggerganov commented on 2024-05-29

Reviewers

ggerganov

slaren

Assignees

No one assigned

Labels

model python Review Complexity : Medium

Milestone

No milestone

llama.cpp Add support for DeepseekV2ForCausalLM #7519 Merged

Add support for DeepseekV2ForCausalLM #7519

llama.cpp
Add support for DeepseekV2ForCausalLM
#7519

Merged