llama.cpp
llama : add high-throughput mode
#14363
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
17
Changes
View On
GitHub
Commits
kv-cache : prepare K/V buffers for separation
ggerganov
committed
166 days ago
batched-bench : fix oob write
ggerganov
committed
166 days ago
llama : add "virtual sequences"
ggerganov
committed
166 days ago
llama : use "stream" vs "virtual sequence"
ggerganov
committed
166 days ago
graph : fix stream splitting when KV cache is not used
ggerganov
committed
166 days ago
kv-cache : add multi-stream save/load support
ggerganov
committed
166 days ago
llama : add "--attn-streams" flag
ggerganov
committed
166 days ago
kv-cache : fix handling when find_slot fails
ggerganov
committed
166 days ago
kv-cache : restore find_slot impl
ggerganov
committed
166 days ago
kv-cache : add comments
ggerganov
committed
166 days ago
kv-cache : add bounds checks for sequence id
ggerganov
committed
166 days ago
cont : add n_seq_max to batch allocr
ggerganov
committed
166 days ago
kv-cache : perform stream copies lazily after llama_synchronize
ggerganov
committed
166 days ago
kv-cache : avoid throwing exceptions across the C boundary
ggerganov
committed
166 days ago
CUDA: 4D FlashAttention support (#14628)
ggerganov
committed
166 days ago
llama : rename attn_streams -> kv_unified
ggerganov
committed
164 days ago
common : rename kv_split -> kv_unified
ggerganov
committed
163 days ago
Loading