llama.cpp
llama : support Mamba Selective State Space Models
#5328
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
43
Changes
View On
GitHub
Commits
mamba : begin working on support for Mamba SSM
compilade
committed
2 years ago
mamba : begin figuring out how to (ab)use the kv cache for Mamba
compilade
committed
2 years ago
mamba : recurrent inference almost works, but incoherent
compilade
committed
2 years ago
mamba : recurrent inference WORKS!!!
compilade
committed
2 years ago
convert : optionally use d_conv and d_state from config.json for Mamba
compilade
committed
2 years ago
mamba : refactor recurrent conv, resulting in 20% perf increase
compilade
committed
2 years ago
ggml : parallelize ggml_exp
compilade
committed
2 years ago
mamba : simplify the conv step with a self-overlapping view
compilade
committed
2 years ago
mamba : fix self-overlapping view depth stride
compilade
committed
2 years ago
mamba : handle batches of more than 1 token
compilade
committed
2 years ago
ggml : in ggml_ssm_scan, merge multiple rows in the same vec operation
compilade
committed
2 years ago
mamba : very basic quantization support
compilade
committed
2 years ago
mamba : fuse more steps of the SSM scan in the ggml_ssm_scan operator
compilade
committed
2 years ago
convert : for Mamba, also consider the "MambaLMHeadModel" arch name
compilade
committed
2 years ago
mamba : fix vocab size problems with official models
compilade
committed
2 years ago
ggml : remove ggml_exp and ggml_soft_plus
compilade
committed
2 years ago
mamba : remove some useless comments
compilade
committed
2 years ago
convert : fix flake8 linter errors
compilade
committed
2 years ago
mamba : apply suggestions from code review
compilade
committed
2 years ago
mamba : multiple sequences, but one at a time
compilade
committed
2 years ago
mamba : in comments, properly refer to KV cells instead of slots
compilade
committed
2 years ago
mamba : reduce memory usage of ggml_ssm_scan
compilade
committed
2 years ago
mamba : simultaneous sequence processing
compilade
committed
2 years ago
mamba : support llama_kv_cache_seq_cp copy chains
compilade
committed
2 years ago
mamba : make the server and parallel examples work with whole sequences
compilade
committed
2 years ago
mamba : dedicate an input tensor for state copy indices
compilade
committed
2 years ago
mamba : adapt perplexity, batched, and batched-bench examples
compilade
committed
2 years ago
mamba : stop abusing attention metadata
compilade
committed
2 years ago
mamba : more correctly update the "used" field of the KV cache
compilade
committed
2 years ago
ggml : in ggml_ssm_scan, use a threshold for soft_plus
compilade
committed
2 years ago
convert : for Mamba, fallback to internal NeoX tokenizer
compilade
committed
2 years ago
mamba : support state saving and restoring
compilade
committed
2 years ago
ggml : implicitly pass src tensors through dst for Mamba-related ops
compilade
committed
2 years ago
mamba : clarify some comments
compilade
committed
2 years ago
Merge branch 'master' into support-mamba-ssm
compilade
committed
2 years ago
Merge branch 'master' into support-mamba-ssm
compilade
committed
2 years ago
server : fix cache_tokens not getting correctly resized
compilade
committed
2 years ago
convert-hf : support new metadata keys for Mamba
compilade
committed
2 years ago
mamba : rename metadata to be more similar to transformers library
compilade
committed
2 years ago
mamba : add missing spaces
compilade
committed
2 years ago
convert-hf : omit output.weight when identical with token_embd.weight
compilade
committed
2 years ago
readme : add Mamba to supported models, and add recent API changes
compilade
committed
2 years ago
mamba : move state_seq and state_mask views outside layer loop
compilade
committed
2 years ago
Loading