llama.cpp
llama : add high-throughput mode
#14363
Merged

Commits
  • kv-cache : prepare K/V buffers for separation
    ggerganov committed 166 days ago
  • batched-bench : fix oob write
    ggerganov committed 166 days ago
  • llama : add "virtual sequences"
    ggerganov committed 166 days ago
  • llama : use "stream" vs "virtual sequence"
    ggerganov committed 166 days ago
  • graph : fix stream splitting when KV cache is not used
    ggerganov committed 166 days ago
  • kv-cache : add multi-stream save/load support
    ggerganov committed 166 days ago
  • llama : add "--attn-streams" flag
    ggerganov committed 166 days ago
  • kv-cache : fix handling when find_slot fails
    ggerganov committed 166 days ago
  • kv-cache : restore find_slot impl
    ggerganov committed 166 days ago
  • kv-cache : add comments
    ggerganov committed 166 days ago
  • kv-cache : add bounds checks for sequence id
    ggerganov committed 166 days ago
  • cont : add n_seq_max to batch allocr
    ggerganov committed 166 days ago
  • kv-cache : perform stream copies lazily after llama_synchronize
    ggerganov committed 166 days ago
  • kv-cache : avoid throwing exceptions across the C boundary
    ggerganov committed 166 days ago
  • CUDA: 4D FlashAttention support (#14628)
    ggerganov committed 166 days ago
  • llama : rename attn_streams -> kv_unified
    ggerganov committed 164 days ago
  • common : rename kv_split -> kv_unified
    ggerganov committed 163 days ago
Loading