llama.cpp
llama : support Mamba Selective State Space Models
#5328
Merged

llama : support Mamba Selective State Space Models #5328

compilade
compilade compilade marked this pull request as draft 2 years ago
FSSRepo
compilade
ggerganov
ggerganov commented on 2024-02-05
compilade
compilade
ggerganov
compilade compilade force pushed 2 years ago
compilade
compilade compilade force pushed 2 years ago
compilade compilade force pushed 2 years ago
compilade compilade force pushed 2 years ago
compilade compilade force pushed 2 years ago
compilade compilade force pushed 2 years ago
compilade mamba : begin working on support for Mamba SSM
8cd0a286
compilade mamba : begin figuring out how to (ab)use the kv cache for Mamba
5a69a262
compilade mamba : recurrent inference almost works, but incoherent
f680364b
compilade mamba : recurrent inference WORKS!!!
54d3e486
compilade convert : optionally use d_conv and d_state from config.json for Mamba
74eea856
compilade mamba : refactor recurrent conv, resulting in 20% perf increase
9e77061a
compilade ggml : parallelize ggml_exp
3f7233b6
compilade mamba : simplify the conv step with a self-overlapping view
e9cc45ec
compilade mamba : fix self-overlapping view depth stride
81b57bb3
compilade mamba : handle batches of more than 1 token
ffc116f5
compilade ggml : in ggml_ssm_scan, merge multiple rows in the same vec operation
78a853b7
compilade mamba : very basic quantization support
5816ae68
compilade mamba : fuse more steps of the SSM scan in the ggml_ssm_scan operator
a3f4a1c7
compilade convert : for Mamba, also consider the "MambaLMHeadModel" arch name
9f55809f
compilade mamba : fix vocab size problems with official models
cd0f33f2
compilade ggml : remove ggml_exp and ggml_soft_plus
de92f156
compilade mamba : remove some useless comments
766db753
compilade convert : fix flake8 linter errors
c52fb3c2
compilade mamba : apply suggestions from code review
6ff34da0
compilade mamba : multiple sequences, but one at a time
8a43ffcf
compilade mamba : in comments, properly refer to KV cells instead of slots
e73eaa7b
compilade mamba : reduce memory usage of ggml_ssm_scan
de50c549
compilade mamba : simultaneous sequence processing
9473ec21
compilade mamba : support llama_kv_cache_seq_cp copy chains
3dcf7982
compilade mamba : make the server and parallel examples work with whole sequences
34e2fca8
compilade mamba : dedicate an input tensor for state copy indices
79d636cc
compilade mamba : adapt perplexity, batched, and batched-bench examples
8f605cfe
compilade mamba : stop abusing attention metadata
206e8ee2
compilade mamba : more correctly update the "used" field of the KV cache
1af1000f
compilade ggml : in ggml_ssm_scan, use a threshold for soft_plus
d52dd501
compilade convert : for Mamba, fallback to internal NeoX tokenizer
b83fbc92
compilade compilade force pushed to b83fbc92 2 years ago
compilade mamba : support state saving and restoring
eefb794b
compilade compilade marked this pull request as ready for review 2 years ago
compilade
ggerganov
ggerganov commented on 2024-03-04
compilade ggml : implicitly pass src tensors through dst for Mamba-related ops
2a99d1b2
compilade
compilade
compilade compilade force pushed from 2a99d1b2 2 years ago
compilade compilade force pushed 2 years ago
compilade mamba : clarify some comments
93fd4b8d
compilade Merge branch 'master' into support-mamba-ssm
5544f521
compilade compilade force pushed to 5544f521 2 years ago
ggerganov
ggerganov approved these changes on 2024-03-07
compilade Merge branch 'master' into support-mamba-ssm
916b5863
compilade server : fix cache_tokens not getting correctly resized
7cd5a1f9
compilade
compilade
compilade convert-hf : support new metadata keys for Mamba
d8024a48
compilade mamba : rename metadata to be more similar to transformers library
17e4d6c9
compilade mamba : add missing spaces
1c8ea558
ggerganov
compilade convert-hf : omit output.weight when identical with token_embd.weight
d0d32dce
compilade
compilade readme : add Mamba to supported models, and add recent API changes
3e5685f7
compilade mamba : move state_seq and state_mask views outside layer loop
39579d3c
compilade compilade merged c2101a2e into master 2 years ago
cold-blue
cold-blue commented on 2024-04-08

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone