llama.cpp
llama : add high-throughput mode
#14363
Merged

llama : add high-throughput mode #14363

ggerganov merged 17 commits into master from gg/llama-high-throughput
ggerganov
github-actions github-actions added examples
github-actions github-actions added ggml
github-actions github-actions added Apple Metal
JohannesGaessler
ggerganov
ggerganov ggerganov force pushed from ab2a2bb1 to 1b74b9d7 78 days ago
ggerganov ggerganov force pushed from 61795789 to dfceb012 70 days ago
Base automatically changed from gg/kv-cache-use-set-rows to master 69 days ago
ggerganov ggerganov force pushed from dfceb012 to eb5856cd 69 days ago
ggerganov ggerganov force pushed from eb5856cd to ee0f729a 69 days ago
ggerganov ggerganov force pushed from ee0f729a to deae7cda 69 days ago
ggerganov ggerganov force pushed from deae7cda to 988d0cd8 69 days ago
ggerganov ggerganov force pushed from 988d0cd8 to dbcfcaae 69 days ago
compilade
compilade commented on 2025-07-03
ggerganov ggerganov force pushed from dbcfcaae to 33dcc3c9 68 days ago
ggerganov ggerganov force pushed from 33dcc3c9 to 53638179 68 days ago
ggerganov ggerganov force pushed from 53638179 to 7b004292 68 days ago
ggerganov ggerganov force pushed from d2415830 to 4a0ec58d 68 days ago
ggerganov ggerganov force pushed from d04f8241 to fa2573e3 68 days ago
ggerganov ggerganov marked this pull request as ready for review 68 days ago
ggerganov ggerganov force pushed from c96c48c6 to 5c00eb22 68 days ago
ggerganov
slaren
slaren commented on 2025-07-04
ggerganov ggerganov force pushed from a00dba75 to ffe7f637 65 days ago
ggerganov ggerganov force pushed from ffe7f637 to 832cd921 65 days ago
ggerganov ggerganov force pushed from d69b376b to 2aa6fa09 65 days ago
ggerganov
slaren
ggerganov
compilade
ggerganov
compilade
ggerganov
ggerganov ggerganov force pushed from 2aa6fa09 to f23950a6 62 days ago
ggerganov ggerganov force pushed from f23950a6 to ab82dc20 61 days ago
ggerganov ggerganov requested a review from JohannesGaessler JohannesGaessler 61 days ago
ggerganov ggerganov added hot
github-actions github-actions added testing
github-actions github-actions added Nvidia GPU
ggerganov kv-cache : prepare K/V buffers for separation
be82648b
ggerganov batched-bench : fix oob write
5a354755
ggerganov llama : add "virtual sequences"
45ecf841
ggerganov llama : use "stream" vs "virtual sequence"
4c2d6510
ggerganov graph : fix stream splitting when KV cache is not used
0d05acd6
ggerganov kv-cache : add multi-stream save/load support
247015ee
ggerganov llama : add "--attn-streams" flag
3354ce7e
ggerganov kv-cache : fix handling when find_slot fails
18fb95dd
ggerganov kv-cache : restore find_slot impl
cbe971ae
ggerganov kv-cache : add comments
1b4fbc8f
ggerganov kv-cache : add bounds checks for sequence id
8bf7fec0
ggerganov cont : add n_seq_max to batch allocr
91751ead
ggerganov kv-cache : perform stream copies lazily after llama_synchronize
2d08a395
ggerganov kv-cache : avoid throwing exceptions across the C boundary
69169b15
JohannesGaessler CUDA: 4D FlashAttention support (#14628)
886d3f15
ggerganov ggerganov force pushed from c43f275d to 886d3f15 60 days ago
ggerganov
sais-github
JohannesGaessler
ggerganov
JohannesGaessler
ggerganov
JohannesGaessler
ddh0
JohannesGaessler
ggerganov llama : rename attn_streams -> kv_unified
fb8150d8
ggerganov
slaren
slaren approved these changes on 2025-07-16
ggerganov
JohannesGaessler
ggerganov common : rename kv_split -> kv_unified
318c4f8f
ggerganov
ggerganov
ggerganov
JohannesGaessler
ggerganov ggerganov merged 225e7a14 into master 56 days ago
ggerganov ggerganov deleted the gg/llama-high-throughput branch 56 days ago
rujialiu
ggerganov
rujialiu
ggerganov
CISC
ggerganov
rujialiu
ggerganov
rujialiu
ggerganov
rujialiu
rujialiu
ggerganov
rujialiu
rujialiu
ggerganov
rujialiu
ggerganov
rujialiu
prd-tuong-nguyen
ggerganov
prd-tuong-nguyen
ggerganov
guokoni
guokoni

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone