llama.cpp
llama : add high-throughput mode
#14363
Merged

llama : add high-throughput mode #14363

ggerganov merged 17 commits into master from gg/llama-high-throughput
ggerganov
github-actions github-actions added examples
github-actions github-actions added ggml
github-actions github-actions added Apple Metal
JohannesGaessler
ggerganov
ggerganov ggerganov force pushed to 1b74b9d7 207 days ago
ggerganov ggerganov force pushed from 61795789 to dfceb012 199 days ago
Base automatically changed from gg/kv-cache-use-set-rows to master 198 days ago
ggerganov ggerganov force pushed from dfceb012 198 days ago
ggerganov ggerganov force pushed 198 days ago
ggerganov ggerganov force pushed 198 days ago
ggerganov ggerganov force pushed 198 days ago
ggerganov ggerganov force pushed to dbcfcaae 198 days ago
compilade
compilade commented on 2025-07-03
ggerganov ggerganov force pushed from dbcfcaae 197 days ago
ggerganov ggerganov force pushed 197 days ago
ggerganov ggerganov force pushed to 7b004292 197 days ago
ggerganov ggerganov force pushed 197 days ago
ggerganov ggerganov force pushed 197 days ago
ggerganov ggerganov marked this pull request as ready for review 197 days ago
ggerganov ggerganov force pushed to 5c00eb22 197 days ago
ggerganov
slaren
slaren commented on 2025-07-04
ggerganov ggerganov force pushed 194 days ago
ggerganov ggerganov force pushed 194 days ago
ggerganov ggerganov force pushed 194 days ago
ggerganov
slaren
ggerganov
compilade
ggerganov
compilade
ggerganov
ggerganov ggerganov force pushed to f23950a6 191 days ago
ggerganov ggerganov force pushed from f23950a6 to ab82dc20 190 days ago
ggerganov ggerganov requested a review from JohannesGaessler JohannesGaessler 190 days ago
ggerganov ggerganov added hot
github-actions github-actions added testing
github-actions github-actions added Nvidia GPU
ggerganov kv-cache : prepare K/V buffers for separation
be82648b
ggerganov batched-bench : fix oob write
5a354755
ggerganov llama : add "virtual sequences"
45ecf841
ggerganov llama : use "stream" vs "virtual sequence"
4c2d6510
ggerganov graph : fix stream splitting when KV cache is not used
0d05acd6
ggerganov kv-cache : add multi-stream save/load support
247015ee
ggerganov llama : add "--attn-streams" flag
3354ce7e
ggerganov kv-cache : fix handling when find_slot fails
18fb95dd
ggerganov kv-cache : restore find_slot impl
cbe971ae
ggerganov kv-cache : add comments
1b4fbc8f
ggerganov kv-cache : add bounds checks for sequence id
8bf7fec0
ggerganov cont : add n_seq_max to batch allocr
91751ead
ggerganov kv-cache : perform stream copies lazily after llama_synchronize
2d08a395
ggerganov kv-cache : avoid throwing exceptions across the C boundary
69169b15
JohannesGaessler CUDA: 4D FlashAttention support (#14628)
886d3f15
ggerganov ggerganov force pushed to 886d3f15 189 days ago
ggerganov
sais-github
JohannesGaessler
ggerganov
JohannesGaessler
ggerganov
JohannesGaessler
ddh0
JohannesGaessler
ggerganov llama : rename attn_streams -> kv_unified
fb8150d8
ggerganov
slaren
slaren approved these changes on 2025-07-16
ggerganov
JohannesGaessler
ggerganov common : rename kv_split -> kv_unified
318c4f8f
ggerganov
ggerganov
ggerganov
JohannesGaessler
ggerganov ggerganov merged 225e7a14 into master 185 days ago
ggerganov ggerganov deleted the gg/llama-high-throughput branch 185 days ago
rujialiu
ggerganov
rujialiu
ggerganov
CISC
ggerganov
rujialiu
ggerganov
rujialiu
ggerganov
rujialiu
rujialiu
ggerganov
rujialiu
rujialiu
ggerganov
rujialiu
ggerganov
rujialiu
prd-tuong-nguyen
ggerganov
prd-tuong-nguyen
ggerganov
guokoni
guokoni

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone