llama.cpp
llama : add high-throughput mode
#14363
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
17
Changes
View On
GitHub
llama : add high-throughput mode
#14363
ggerganov
merged 17 commits into
master
from
gg/llama-high-throughput
github-actions
added
examples
github-actions
added
ggml
github-actions
added
Apple Metal
ggerganov
force pushed
from
ab2a2bb1
to
1b74b9d7
78 days ago
ggerganov
force pushed
from
61795789
to
dfceb012
70 days ago
Base automatically changed from
gg/kv-cache-use-set-rows
to
master
69 days ago
ggerganov
force pushed
from
dfceb012
to
eb5856cd
69 days ago
ggerganov
force pushed
from
eb5856cd
to
ee0f729a
69 days ago
ggerganov
force pushed
from
ee0f729a
to
deae7cda
69 days ago
ggerganov
force pushed
from
deae7cda
to
988d0cd8
69 days ago
ggerganov
force pushed
from
988d0cd8
to
dbcfcaae
69 days ago
compilade
commented on 2025-07-03
ggerganov
force pushed
from
dbcfcaae
to
33dcc3c9
68 days ago
ggerganov
force pushed
from
33dcc3c9
to
53638179
68 days ago
ggerganov
force pushed
from
53638179
to
7b004292
68 days ago
ggerganov
force pushed
from
d2415830
to
4a0ec58d
68 days ago
ggerganov
force pushed
from
d04f8241
to
fa2573e3
68 days ago
ggerganov
marked this pull request as ready for review
68 days ago
ggerganov
force pushed
from
c96c48c6
to
5c00eb22
68 days ago
slaren
commented on 2025-07-04
ggerganov
force pushed
from
a00dba75
to
ffe7f637
65 days ago
ggerganov
force pushed
from
ffe7f637
to
832cd921
65 days ago
ggerganov
force pushed
from
d69b376b
to
2aa6fa09
65 days ago
ggerganov
force pushed
from
2aa6fa09
to
f23950a6
62 days ago
ggerganov
force pushed
from
f23950a6
to
ab82dc20
61 days ago
ggerganov
requested a review
from
JohannesGaessler
61 days ago
ggerganov
added
hot
github-actions
added
testing
github-actions
added
Nvidia GPU
kv-cache : prepare K/V buffers for separation
be82648b
batched-bench : fix oob write
5a354755
llama : add "virtual sequences"
45ecf841
llama : use "stream" vs "virtual sequence"
4c2d6510
graph : fix stream splitting when KV cache is not used
0d05acd6
kv-cache : add multi-stream save/load support
247015ee
llama : add "--attn-streams" flag
3354ce7e
kv-cache : fix handling when find_slot fails
18fb95dd
kv-cache : restore find_slot impl
cbe971ae
kv-cache : add comments
1b4fbc8f
kv-cache : add bounds checks for sequence id
8bf7fec0
cont : add n_seq_max to batch allocr
91751ead
kv-cache : perform stream copies lazily after llama_synchronize
2d08a395
kv-cache : avoid throwing exceptions across the C boundary
69169b15
CUDA: 4D FlashAttention support (#14628)
886d3f15
ggerganov
force pushed
from
c43f275d
to
886d3f15
60 days ago
llama : rename attn_streams -> kv_unified
fb8150d8
slaren
approved these changes on 2025-07-16
common : rename kv_split -> kv_unified
318c4f8f
ggerganov
merged
225e7a14
into master
56 days ago
ggerganov
deleted the gg/llama-high-throughput branch
56 days ago
Login to write a write a comment.
Login via GitHub
Reviewers
slaren
compilade
JohannesGaessler
Assignees
No one assigned
Labels
testing
Nvidia GPU
examples
ggml
Apple Metal
hot
Milestone
No milestone
Login to write a write a comment.
Login via GitHub