llama.cpp
llama : initial Mamba-2 support
#9126
Merged

llama : initial Mamba-2 support #9126

compilade merged 44 commits into master from compilade/mamba2
compilade
compilade compilade marked this pull request as draft 1 year ago
github-actions github-actions added python
github-actions github-actions added ggml
compilade llama : initial Mamba-2 support
1f0fea70
compilade ggml : SIMD ggml_ssm_scan for Mamba-2
dceff23f
compilade llama : support running Mamba-Codestral-7B-v0.1
2bfe9de6
compilade llama : fix Mamba-2 conv state saving
aff96920
compilade compilade force pushed from e9b0d198 to aff96920 1 year ago
compilade compilade changed the base branch from compilade/batch-splits to master 1 year ago
compilade compilade marked this pull request as ready for review 1 year ago
compilade compilade added Review Complexity : Medium
compilade llama : remove unused variable
e04910dc
compilade llama : add missing break
fa358e70
ngxson
compilade
compilade convert_hf : prefer SentencePiece tokenizer for Mamba-2 when present
38913dc8
ngxson
Vaibhavs10
Vaibhavs10 commented on 2024-08-23
isr431
learning-chip
molbap
HanClinto
compilade
compilade Merge branch 'master' into compilade/mamba2
0e601caf
hg0428
compilade llama : avoid redundant state copy for Mamba 1 and 2
273e7a49
compilade Merge branch 'master' into compilade/mamba2
7d6cb368
github-actions github-actions added testing
compilade metal : attempt to adapt SSM_SCAN for Mamba-2
2c77d799
compilade metal : fix SSM_SCAN pipeline scope
87b97d08
ggerganov
ggerganov commented on 2024-10-02
compilade metal : use log and exp instead of log1pf and expf in SSM_SCAN
03d0e6ea
compilade metal : remove unused arguments for SSM_SCAN
7a351abc
compilade metal : add back n_seqs to SSM_SCAN args
8b15bc6f
compilade metal : fix SSM_SCAN state head offset
5b8ec2b9
compilade metal : fix wrong number of tokens per sequence in SSM_SCAN
62b09b34
compilade Merge branch 'master' into compilade/mamba2
038d9583
compilade ggml : remove unused fast broadcast path in GGML_MUL
805512a7
compilade Merge branch 'master' into compilade/mamba2
7d16e1bc
compilade ggml : avoid multiply by D in GGML_OP_SSM_SCAN
3bc7103d
compilade Merge branch 'master' into compilade/mamba2
8d8f0657
compilade convert : fix flake8 lint
b4e9c599
compilade Merge branch 'master' into compilade/mamba2
1ee6c482
aallgeier
Tangshengku
compilade
yichen-f
compilade Merge branch 'master' into compilade/mamba2
c9ecf620
github-actions github-actions added Apple Metal
compilade
yichen-f
compilade
Tangshengku
compilade
Tangshengku
compilade
Tangshengku
gabe-l-hart
compilade Merge branch 'master' into compilade/mamba2
35d06fac
compilade metal : fix confusion between ; and ,
cf4f0a41
compilade metal : add missing args for nb references in ssm_scan_f32_group
6def5cd7
compilade metal : single-user mamba2 inference works
791998b4
compilade
compilade kv-cache : remove const_cast when setting inputs for s_copy
94c3d530
compilade
ggerganov
compilade Merge branch 'master' into compilade/mamba2
929fe85d
compilade convert : avoid AutoConfig for Mamba and Mamba2 hparams
d55b0d06
compilade kv-cache : allow context shift for recurrent models
e94f3932
gabe-l-hart
compilade
gabe-l-hart
Tangshengku
gabe-l-hart
ggerganov
gabe-l-hart
compilade
ggerganov
gabe-l-hart
compilade
ggerganov
gabe-l-hart
gabe-l-hart
compilade Merge branch 'master' into compilade/mamba2
9864bfcd
compilade graph : fix recurrent state copies when avoiding copies
2fa5f2ce
compilade
gabe-l-hart
gabe-l-hart
gabe-l-hart
compilade ggml : fix mamba2 ssm scan when compiled with SVE
757aa623
compilade ggml-cpu : reorder SVE FMA for consistency with other SIMD arches
0b6f6bec
gabe-l-hart
compilade
gabe-l-hart
gabe-l-hart
compilade Merge branch 'master' into compilade/mamba2
a42f2394
compilade cuda : implement ssm scan for Mamba2
f8c7caee
github-actions github-actions added Nvidia GPU
compilade Merge branch 'master' into compilade/mamba2
830e5542
compilade
gabe-l-hart
younesbelkada
compilade Merge branch 'master' into compilade/mamba2
afdb6692
compilade
younesbelkada
younesbelkada
ggerganov
gabe-l-hart
compilade
younesbelkada
gabe-l-hart
gabe-l-hart
gabe-l-hart
gabe-l-hart
compilade
compilade mamba : fix mismatched new and delete size for llm_build_mamba
dc1d109d
gabe-l-hart
compilade
ggerganov
gabe-l-hart
gabe-l-hart
gabe-l-hart
gabe-l-hart
ggerganov
ggerganov approved these changes on 2025-07-01
compilade Merge branch 'master' into compilade/mamba2
73de1fd1
compilade cuda : graceful fallback for Mamba-1 models with weird embd size
71bef665
compilade
compilade compilade merged 5d46babd into master 80 days ago
gabe-l-hart

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone