PR #9126 llama : initial Mamba-2 support

llama : initial Mamba-2 support #9126

compilade merged 44 commits into master from compilade/mamba2

compilade marked this pull request as draft 1 year ago

github-actions added python

github-actions added ggml

llama : initial Mamba-2 support

1f0fea70

ggml : SIMD ggml_ssm_scan for Mamba-2

dceff23f

llama : support running Mamba-Codestral-7B-v0.1

2bfe9de6

llama : fix Mamba-2 conv state saving

aff96920

compilade force pushed to aff96920 1 year ago

compilade changed the base branch from compilade/batch-splits to master 1 year ago

compilade marked this pull request as ready for review 1 year ago

compilade added Review Complexity : Medium

llama : remove unused variable

e04910dc

llama : add missing break

fa358e70

convert_hf : prefer SentencePiece tokenizer for Mamba-2 when present

38913dc8

Vaibhavs10 commented on 2024-08-23

Merge branch 'master' into compilade/mamba2

0e601caf

llama : avoid redundant state copy for Mamba 1 and 2

273e7a49

Merge branch 'master' into compilade/mamba2

7d6cb368

github-actions added testing

metal : attempt to adapt SSM_SCAN for Mamba-2

2c77d799

metal : fix SSM_SCAN pipeline scope

87b97d08

ggerganov commented on 2024-10-02

metal : use log and exp instead of log1pf and expf in SSM_SCAN

03d0e6ea

metal : remove unused arguments for SSM_SCAN

7a351abc

metal : add back n_seqs to SSM_SCAN args

8b15bc6f

metal : fix SSM_SCAN state head offset

5b8ec2b9

metal : fix wrong number of tokens per sequence in SSM_SCAN

62b09b34

Merge branch 'master' into compilade/mamba2

038d9583

ggml : remove unused fast broadcast path in GGML_MUL

805512a7

Merge branch 'master' into compilade/mamba2

7d16e1bc

ggml : avoid multiply by D in GGML_OP_SSM_SCAN

3bc7103d

Merge branch 'master' into compilade/mamba2

8d8f0657

convert : fix flake8 lint

b4e9c599

Merge branch 'master' into compilade/mamba2

1ee6c482

Merge branch 'master' into compilade/mamba2

c9ecf620

github-actions added Apple Metal

Merge branch 'master' into compilade/mamba2

35d06fac

metal : fix confusion between ; and ,

cf4f0a41

metal : add missing args for nb references in ssm_scan_f32_group

6def5cd7

metal : single-user mamba2 inference works

791998b4

kv-cache : remove const_cast when setting inputs for s_copy

94c3d530

Merge branch 'master' into compilade/mamba2

929fe85d

convert : avoid AutoConfig for Mamba and Mamba2 hparams

d55b0d06

kv-cache : allow context shift for recurrent models

e94f3932

Merge branch 'master' into compilade/mamba2

9864bfcd

graph : fix recurrent state copies when avoiding copies

2fa5f2ce

ggml : fix mamba2 ssm scan when compiled with SVE

757aa623

ggml-cpu : reorder SVE FMA for consistency with other SIMD arches

0b6f6bec

Merge branch 'master' into compilade/mamba2

a42f2394

cuda : implement ssm scan for Mamba2

f8c7caee

github-actions added Nvidia GPU

Merge branch 'master' into compilade/mamba2

830e5542

Merge branch 'master' into compilade/mamba2

afdb6692

mamba : fix mismatched new and delete size for llm_build_mamba

dc1d109d

ggerganov approved these changes on 2025-07-01

Merge branch 'master' into compilade/mamba2

73de1fd1

cuda : graceful fallback for Mamba-1 models with weird embd size

71bef665

compilade merged 5d46babd into master 252 days ago

Reviewers

ggerganov

Vaibhavs10

Assignees

No one assigned

Labels

testing Nvidia GPU python Review Complexity : Medium ggml Apple Metal

Milestone

No milestone

llama.cpp llama : initial Mamba-2 support #9126 Merged

llama : initial Mamba-2 support #9126

llama.cpp
llama : initial Mamba-2 support
#9126

Merged