PR #5328 llama : support Mamba Selective State Space Models

mamba : begin working on support for Mamba SSM

compilade committed 2 years ago

mamba : begin figuring out how to (ab)use the kv cache for Mamba

compilade committed 2 years ago

mamba : recurrent inference almost works, but incoherent

compilade committed 2 years ago

mamba : recurrent inference WORKS!!!

compilade committed 2 years ago

convert : optionally use d_conv and d_state from config.json for Mamba

compilade committed 2 years ago

mamba : refactor recurrent conv, resulting in 20% perf increase

compilade committed 2 years ago

ggml : parallelize ggml_exp

compilade committed 2 years ago

mamba : simplify the conv step with a self-overlapping view

compilade committed 2 years ago

mamba : fix self-overlapping view depth stride

compilade committed 2 years ago

mamba : handle batches of more than 1 token

compilade committed 2 years ago

ggml : in ggml_ssm_scan, merge multiple rows in the same vec operation

compilade committed 2 years ago

mamba : very basic quantization support

compilade committed 2 years ago

mamba : fuse more steps of the SSM scan in the ggml_ssm_scan operator

compilade committed 2 years ago

convert : for Mamba, also consider the "MambaLMHeadModel" arch name

compilade committed 2 years ago

mamba : fix vocab size problems with official models

compilade committed 2 years ago

ggml : remove ggml_exp and ggml_soft_plus

compilade committed 2 years ago

mamba : remove some useless comments

compilade committed 2 years ago

convert : fix flake8 linter errors

compilade committed 2 years ago

mamba : apply suggestions from code review

compilade committed 2 years ago

mamba : multiple sequences, but one at a time

compilade committed 2 years ago

mamba : in comments, properly refer to KV cells instead of slots

compilade committed 2 years ago

mamba : reduce memory usage of ggml_ssm_scan

compilade committed 2 years ago

mamba : simultaneous sequence processing

compilade committed 2 years ago

mamba : support llama_kv_cache_seq_cp copy chains

compilade committed 2 years ago

mamba : make the server and parallel examples work with whole sequences

compilade committed 2 years ago

mamba : dedicate an input tensor for state copy indices

compilade committed 2 years ago

mamba : adapt perplexity, batched, and batched-bench examples

compilade committed 2 years ago

mamba : stop abusing attention metadata

compilade committed 2 years ago

mamba : more correctly update the "used" field of the KV cache

compilade committed 2 years ago

ggml : in ggml_ssm_scan, use a threshold for soft_plus

compilade committed 2 years ago

convert : for Mamba, fallback to internal NeoX tokenizer

compilade committed 2 years ago

mamba : support state saving and restoring

compilade committed 2 years ago

ggml : implicitly pass src tensors through dst for Mamba-related ops

compilade committed 2 years ago

mamba : clarify some comments

compilade committed 2 years ago

Merge branch 'master' into support-mamba-ssm

compilade committed 2 years ago

Merge branch 'master' into support-mamba-ssm

compilade committed 2 years ago

server : fix cache_tokens not getting correctly resized

compilade committed 2 years ago

convert-hf : support new metadata keys for Mamba

compilade committed 2 years ago

mamba : rename metadata to be more similar to transformers library

compilade committed 2 years ago

mamba : add missing spaces

compilade committed 2 years ago

convert-hf : omit output.weight when identical with token_embd.weight

compilade committed 2 years ago

readme : add Mamba to supported models, and add recent API changes

compilade committed 2 years ago

mamba : move state_seq and state_mask views outside layer loop

compilade committed 2 years ago

llama.cpp llama : support Mamba Selective State Space Models #5328 Merged

llama.cpp
llama : support Mamba Selective State Space Models
#5328

Merged