llama.cpp
llama : add high-throughput mode
#14363
Merged

llama : add high-throughput mode #14363

ggerganov merged 17 commits into master from gg/llama-high-throughput
ggerganov
github-actions github-actions added examples
github-actions github-actions added ggml
github-actions github-actions added Apple Metal
JohannesGaessler
ggerganov
ggerganov ggerganov force pushed to 1b74b9d7 174 days ago
ggerganov ggerganov force pushed from 61795789 to dfceb012 166 days ago
Base automatically changed from gg/kv-cache-use-set-rows to master 165 days ago
ggerganov ggerganov force pushed from dfceb012 165 days ago
ggerganov ggerganov force pushed 165 days ago
ggerganov ggerganov force pushed 165 days ago
ggerganov ggerganov force pushed 165 days ago
ggerganov ggerganov force pushed to dbcfcaae 165 days ago
compilade
compilade commented on 2025-07-03
ggerganov ggerganov force pushed from dbcfcaae 164 days ago
ggerganov ggerganov force pushed 164 days ago
ggerganov ggerganov force pushed to 7b004292 164 days ago
ggerganov ggerganov force pushed 164 days ago
ggerganov ggerganov force pushed 164 days ago
ggerganov ggerganov marked this pull request as ready for review 164 days ago
ggerganov ggerganov force pushed to 5c00eb22 164 days ago
ggerganov
slaren
slaren commented on 2025-07-04
ggerganov ggerganov force pushed 161 days ago
ggerganov ggerganov force pushed 161 days ago
ggerganov ggerganov force pushed 161 days ago
ggerganov
slaren
ggerganov
compilade
ggerganov
compilade
ggerganov
ggerganov ggerganov force pushed to f23950a6 158 days ago
ggerganov ggerganov force pushed from f23950a6 to ab82dc20 157 days ago
ggerganov ggerganov requested a review from JohannesGaessler JohannesGaessler 157 days ago
ggerganov ggerganov added hot
github-actions github-actions added testing
github-actions github-actions added Nvidia GPU
ggerganov kv-cache : prepare K/V buffers for separation
be82648b
ggerganov batched-bench : fix oob write
5a354755
ggerganov llama : add "virtual sequences"
45ecf841
ggerganov llama : use "stream" vs "virtual sequence"
4c2d6510
ggerganov graph : fix stream splitting when KV cache is not used
0d05acd6
ggerganov kv-cache : add multi-stream save/load support
247015ee
ggerganov llama : add "--attn-streams" flag
3354ce7e
ggerganov kv-cache : fix handling when find_slot fails
18fb95dd
ggerganov kv-cache : restore find_slot impl
cbe971ae
ggerganov kv-cache : add comments
1b4fbc8f
ggerganov kv-cache : add bounds checks for sequence id
8bf7fec0
ggerganov cont : add n_seq_max to batch allocr
91751ead
ggerganov kv-cache : perform stream copies lazily after llama_synchronize
2d08a395
ggerganov kv-cache : avoid throwing exceptions across the C boundary
69169b15
JohannesGaessler CUDA: 4D FlashAttention support (#14628)
886d3f15
ggerganov ggerganov force pushed to 886d3f15 156 days ago
ggerganov
sais-github
JohannesGaessler
ggerganov
JohannesGaessler
ggerganov
JohannesGaessler
ddh0
JohannesGaessler
ggerganov llama : rename attn_streams -> kv_unified
fb8150d8
ggerganov
slaren
slaren approved these changes on 2025-07-16
ggerganov
JohannesGaessler
ggerganov common : rename kv_split -> kv_unified
318c4f8f
ggerganov
ggerganov
ggerganov
JohannesGaessler
ggerganov ggerganov merged 225e7a14 into master 152 days ago
ggerganov ggerganov deleted the gg/llama-high-throughput branch 152 days ago
rujialiu
ggerganov
rujialiu
ggerganov
CISC
ggerganov
rujialiu
ggerganov
rujialiu
ggerganov
rujialiu
rujialiu
ggerganov
rujialiu
rujialiu
ggerganov
rujialiu
ggerganov
rujialiu
prd-tuong-nguyen
ggerganov
prd-tuong-nguyen
ggerganov
guokoni
guokoni

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone