llama.cpp
llama : support Mamba Selective State Space Models
#5328
Merged

Commits
  • mamba : begin working on support for Mamba SSM
    compilade committed 2 years ago
  • mamba : begin figuring out how to (ab)use the kv cache for Mamba
    compilade committed 2 years ago
  • mamba : recurrent inference almost works, but incoherent
    compilade committed 2 years ago
  • mamba : recurrent inference WORKS!!!
    compilade committed 2 years ago
  • convert : optionally use d_conv and d_state from config.json for Mamba
    compilade committed 2 years ago
  • mamba : refactor recurrent conv, resulting in 20% perf increase
    compilade committed 2 years ago
  • ggml : parallelize ggml_exp
    compilade committed 2 years ago
  • mamba : simplify the conv step with a self-overlapping view
    compilade committed 2 years ago
  • mamba : fix self-overlapping view depth stride
    compilade committed 2 years ago
  • mamba : handle batches of more than 1 token
    compilade committed 2 years ago
  • ggml : in ggml_ssm_scan, merge multiple rows in the same vec operation
    compilade committed 2 years ago
  • mamba : very basic quantization support
    compilade committed 2 years ago
  • mamba : fuse more steps of the SSM scan in the ggml_ssm_scan operator
    compilade committed 2 years ago
  • convert : for Mamba, also consider the "MambaLMHeadModel" arch name
    compilade committed 2 years ago
  • mamba : fix vocab size problems with official models
    compilade committed 2 years ago
  • ggml : remove ggml_exp and ggml_soft_plus
    compilade committed 2 years ago
  • mamba : remove some useless comments
    compilade committed 2 years ago
  • convert : fix flake8 linter errors
    compilade committed 2 years ago
  • mamba : apply suggestions from code review
    compilade committed 2 years ago
  • mamba : multiple sequences, but one at a time
    compilade committed 2 years ago
  • mamba : in comments, properly refer to KV cells instead of slots
    compilade committed 2 years ago
  • mamba : reduce memory usage of ggml_ssm_scan
    compilade committed 2 years ago
  • mamba : simultaneous sequence processing
    compilade committed 2 years ago
  • mamba : support llama_kv_cache_seq_cp copy chains
    compilade committed 2 years ago
  • mamba : make the server and parallel examples work with whole sequences
    compilade committed 2 years ago
  • mamba : dedicate an input tensor for state copy indices
    compilade committed 2 years ago
  • mamba : adapt perplexity, batched, and batched-bench examples
    compilade committed 2 years ago
  • mamba : stop abusing attention metadata
    compilade committed 2 years ago
  • mamba : more correctly update the "used" field of the KV cache
    compilade committed 2 years ago
  • ggml : in ggml_ssm_scan, use a threshold for soft_plus
    compilade committed 2 years ago
  • convert : for Mamba, fallback to internal NeoX tokenizer
    compilade committed 2 years ago
  • mamba : support state saving and restoring
    compilade committed 2 years ago
  • ggml : implicitly pass src tensors through dst for Mamba-related ops
    compilade committed 2 years ago
  • mamba : clarify some comments
    compilade committed 2 years ago
  • Merge branch 'master' into support-mamba-ssm
    compilade committed 2 years ago
  • Merge branch 'master' into support-mamba-ssm
    compilade committed 2 years ago
  • server : fix cache_tokens not getting correctly resized
    compilade committed 2 years ago
  • convert-hf : support new metadata keys for Mamba
    compilade committed 2 years ago
  • mamba : rename metadata to be more similar to transformers library
    compilade committed 2 years ago
  • mamba : add missing spaces
    compilade committed 2 years ago
  • convert-hf : omit output.weight when identical with token_embd.weight
    compilade committed 2 years ago
  • readme : add Mamba to supported models, and add recent API changes
    compilade committed 2 years ago
  • mamba : move state_seq and state_mask views outside layer loop
    compilade committed 2 years ago
Loading