llama : support Mamba Selective State Space Models #5328
compilade
marked this pull request as draft 2 years ago
mamba : begin working on support for Mamba SSM
8cd0a286
mamba : begin figuring out how to (ab)use the kv cache for Mamba
5a69a262
mamba : recurrent inference almost works, but incoherent
f680364b
mamba : recurrent inference WORKS!!!
54d3e486
convert : optionally use d_conv and d_state from config.json for Mamba
74eea856
mamba : refactor recurrent conv, resulting in 20% perf increase
9e77061a
ggml : parallelize ggml_exp
3f7233b6
mamba : simplify the conv step with a self-overlapping view
e9cc45ec
mamba : fix self-overlapping view depth stride
81b57bb3
mamba : handle batches of more than 1 token
ffc116f5
ggml : in ggml_ssm_scan, merge multiple rows in the same vec operation
78a853b7
mamba : very basic quantization support
5816ae68
mamba : fuse more steps of the SSM scan in the ggml_ssm_scan operator
a3f4a1c7
convert : for Mamba, also consider the "MambaLMHeadModel" arch name
9f55809f
mamba : fix vocab size problems with official models
cd0f33f2
ggml : remove ggml_exp and ggml_soft_plus
de92f156
mamba : remove some useless comments
766db753
convert : fix flake8 linter errors
c52fb3c2
mamba : apply suggestions from code review
6ff34da0
mamba : multiple sequences, but one at a time
8a43ffcf
mamba : in comments, properly refer to KV cells instead of slots
e73eaa7b
mamba : reduce memory usage of ggml_ssm_scan
de50c549
mamba : simultaneous sequence processing
9473ec21
mamba : support llama_kv_cache_seq_cp copy chains
3dcf7982
mamba : make the server and parallel examples work with whole sequences
34e2fca8
mamba : dedicate an input tensor for state copy indices
79d636cc
mamba : adapt perplexity, batched, and batched-bench examples
8f605cfe
mamba : stop abusing attention metadata
206e8ee2
mamba : more correctly update the "used" field of the KV cache
1af1000f
ggml : in ggml_ssm_scan, use a threshold for soft_plus
d52dd501
convert : for Mamba, fallback to internal NeoX tokenizer
b83fbc92
compilade
force pushed
to
b83fbc92
2 years ago
mamba : support state saving and restoring
eefb794b
compilade
marked this pull request as ready for review 2 years ago
ggml : implicitly pass src tensors through dst for Mamba-related ops
2a99d1b2
compilade
force pushed
from
2a99d1b2
2 years ago
mamba : clarify some comments
93fd4b8d
Merge branch 'master' into support-mamba-ssm
5544f521
compilade
force pushed
to
5544f521
2 years ago
ggerganov
approved these changes
on 2024-03-07
Merge branch 'master' into support-mamba-ssm
916b5863
server : fix cache_tokens not getting correctly resized
7cd5a1f9
convert-hf : support new metadata keys for Mamba
d8024a48
mamba : rename metadata to be more similar to transformers library
17e4d6c9
mamba : add missing spaces
1c8ea558
convert-hf : omit output.weight when identical with token_embd.weight
d0d32dce
readme : add Mamba to supported models, and add recent API changes
3e5685f7
mamba : move state_seq and state_mask views outside layer loop
39579d3c
compilade
merged
c2101a2e
into master 2 years ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub