llama.cpp
Add support for DeepseekV2ForCausalLM
#7519
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
30
Changes
View On
GitHub
Add support for DeepseekV2ForCausalLM
#7519
fairydreaming
merged 30 commits into
ggml-org:master
from
fairydreaming:deepseek-v2
Added initial support for DeepseekV2ForCausalLM.
c8c353f8
Merge branch 'ggerganov:master' into deepseek-v2
b24c9ed5
Removed unnecessary tensor operations.
03989640
Added five new DeepSeek-V2-specific parameters:
b50c07c2
Added initial support for DeepSeek-V2-Lite model.
79f84177
Corrected mscale calculation.
60509416
Added expert_weights_scale parameter for scaling MoE gate weights.
7e4786bb
Temporarily hard-coded mscale value for DeepSeek-V2 (FIXME!).
71a74225
Replaced hardcoded mscale value with rescaling attn_factor that resul…
f99df46f
Whitespace formatting fixes.
3ae7235e
Referenced the relevant GitHub discussion instead of providing long c…
68a51030
Added YaRN log multiplier model header parameter corresponding to the…
7be56da9
Added 16B and 236B model types for DeepSeek-V2.
842ff3fe
Removed usage of output bias tensor since it's not present in DeepSee…
c033958d
Merge remote-tracking branch 'upstream/master' into deepseek-v2
a54685b9
gguf-py : re-add SCALING_YARN_LOG_MUL removed during merge by accident
bb9c3618
github-actions
added
python
mofosyne
added
model
mofosyne
added
Review Complexity : Medium
llama : correct llm_build_moe_ffn() arguments in build_arctic()
f3b5e7d4
ggerganov
approved these changes on 2024-05-26
ggerganov
requested a review
from
slaren
1 year ago
llama : code style corrections
abef8b26
llama : rename n_expert_ff to n_ff_exp
a654cd99
llama : rename qk_rope_head_dim, qk_nope_head_dim variables to n_embd…
5a3e6b6c
llama : remove trailing whitespaces
20769c0f
llama : rename moe_intermediate_size variable to n_ff_exp
fac1e804
llama : rename n_leading_dense_layer to n_layer_dense_lead
56f70112
ggerganov
commented on 2024-05-27
llama : use attn_factor in mscale calculation to match the rope_yarn(…
82cec8b8
llama : rename query_states, key_states, value_states to q_states, k_…
5cc7ec16
llama : print DeekSeek-V2-specific parameters in llm_load_print_meta()
d02130d5
convert-hf : fix flake8 Lint errors
bde971a9
Merge remote-tracking branch 'upstream/master' into deepseek-v2
98ff6e1b
llama : replace ggml_new_tensor_3d + ggml_set_inplace + ggml_set_inpl…
841cd474
gguf-py, llama : whitespace formatting fixes
3efb6595
slaren
approved these changes on 2024-05-28
fairydreaming
merged
ee3dff6b
into master
1 year ago
ggerganov
commented on 2024-05-29
Login to write a write a comment.
Login via GitHub
Reviewers
ggerganov
slaren
Assignees
No one assigned
Labels
model
python
Review Complexity : Medium
Milestone
No milestone
Login to write a write a comment.
Login via GitHub