llama.cpp
llama : support Mamba Selective State Space Models
#5328

Merged

llama : support Mamba Selective State Space Models #5328

compilade merged 43 commits into ggml-org:master from compilade:support-mamba-ssm

compilade marked this pull request as draft 2 years ago

ggerganov commented on 2024-02-05

compilade force pushed 2 years ago

mamba : begin working on support for Mamba SSM

8cd0a286

mamba : begin figuring out how to (ab)use the kv cache for Mamba

5a69a262

mamba : recurrent inference almost works, but incoherent

f680364b

mamba : recurrent inference WORKS!!!

54d3e486

convert : optionally use d_conv and d_state from config.json for Mamba

74eea856

mamba : refactor recurrent conv, resulting in 20% perf increase

9e77061a

ggml : parallelize ggml_exp

3f7233b6

mamba : simplify the conv step with a self-overlapping view

e9cc45ec

mamba : fix self-overlapping view depth stride

81b57bb3

mamba : handle batches of more than 1 token

ffc116f5

ggml : in ggml_ssm_scan, merge multiple rows in the same vec operation

78a853b7

mamba : very basic quantization support

5816ae68

mamba : fuse more steps of the SSM scan in the ggml_ssm_scan operator

a3f4a1c7

convert : for Mamba, also consider the "MambaLMHeadModel" arch name

9f55809f

mamba : fix vocab size problems with official models

cd0f33f2

ggml : remove ggml_exp and ggml_soft_plus

de92f156

mamba : remove some useless comments

766db753

convert : fix flake8 linter errors

c52fb3c2

mamba : apply suggestions from code review

6ff34da0

mamba : multiple sequences, but one at a time

8a43ffcf

mamba : in comments, properly refer to KV cells instead of slots

e73eaa7b

mamba : reduce memory usage of ggml_ssm_scan

de50c549

mamba : simultaneous sequence processing

9473ec21

mamba : support llama_kv_cache_seq_cp copy chains

3dcf7982

mamba : make the server and parallel examples work with whole sequences

34e2fca8

mamba : dedicate an input tensor for state copy indices

79d636cc

mamba : adapt perplexity, batched, and batched-bench examples

8f605cfe

mamba : stop abusing attention metadata

206e8ee2

mamba : more correctly update the "used" field of the KV cache

1af1000f

ggml : in ggml_ssm_scan, use a threshold for soft_plus

d52dd501

convert : for Mamba, fallback to internal NeoX tokenizer

b83fbc92

compilade force pushed to b83fbc92 2 years ago

mamba : support state saving and restoring

eefb794b

compilade marked this pull request as ready for review 2 years ago

ggerganov commented on 2024-03-04

ggml : implicitly pass src tensors through dst for Mamba-related ops

2a99d1b2

compilade force pushed from 2a99d1b2 2 years ago

compilade force pushed 2 years ago

mamba : clarify some comments

93fd4b8d

Merge branch 'master' into support-mamba-ssm

5544f521

compilade force pushed to 5544f521 2 years ago

ggerganov approved these changes on 2024-03-07

Merge branch 'master' into support-mamba-ssm

916b5863

server : fix cache_tokens not getting correctly resized

7cd5a1f9

convert-hf : support new metadata keys for Mamba

d8024a48

mamba : rename metadata to be more similar to transformers library

17e4d6c9

mamba : add missing spaces

1c8ea558

convert-hf : omit output.weight when identical with token_embd.weight

d0d32dce

readme : add Mamba to supported models, and add recent API changes

3e5685f7

mamba : move state_seq and state_mask views outside layer loop

39579d3c

compilade merged c2101a2e into master 2 years ago

cold-blue commented on 2024-04-08

Reviewers

ggerganov

cold-blue

Assignees

No one assigned

Labels

None yet

Milestone

No milestone

llama.cpp llama : support Mamba Selective State Space Models #5328 Merged

llama : support Mamba Selective State Space Models #5328

llama.cpp
llama : support Mamba Selective State Space Models
#5328

Merged