llama.cpp
llama : add high-throughput mode
#14363
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
17
Changes
View On
GitHub
llama : add high-throughput mode
#14363
ggerganov
merged 17 commits into
master
from
gg/llama-high-throughput
github-actions
added
examples
github-actions
added
ggml
github-actions
added
Apple Metal
ggerganov
force pushed
to
1b74b9d7
174 days ago
ggerganov
force pushed
from
61795789
to
dfceb012
166 days ago
Base automatically changed from
gg/kv-cache-use-set-rows
to
master
165 days ago
ggerganov
force pushed
from
dfceb012
165 days ago
ggerganov
force pushed
165 days ago
ggerganov
force pushed
165 days ago
ggerganov
force pushed
165 days ago
ggerganov
force pushed
to
dbcfcaae
165 days ago
compilade
commented on 2025-07-03
ggerganov
force pushed
from
dbcfcaae
164 days ago
ggerganov
force pushed
164 days ago
ggerganov
force pushed
to
7b004292
164 days ago
ggerganov
force pushed
164 days ago
ggerganov
force pushed
164 days ago
ggerganov
marked this pull request as ready for review
164 days ago
ggerganov
force pushed
to
5c00eb22
164 days ago
slaren
commented on 2025-07-04
ggerganov
force pushed
161 days ago
ggerganov
force pushed
161 days ago
ggerganov
force pushed
161 days ago
ggerganov
force pushed
to
f23950a6
158 days ago
ggerganov
force pushed
from
f23950a6
to
ab82dc20
157 days ago
ggerganov
requested a review
from
JohannesGaessler
157 days ago
ggerganov
added
hot
github-actions
added
testing
github-actions
added
Nvidia GPU
kv-cache : prepare K/V buffers for separation
be82648b
batched-bench : fix oob write
5a354755
llama : add "virtual sequences"
45ecf841
llama : use "stream" vs "virtual sequence"
4c2d6510
graph : fix stream splitting when KV cache is not used
0d05acd6
kv-cache : add multi-stream save/load support
247015ee
llama : add "--attn-streams" flag
3354ce7e
kv-cache : fix handling when find_slot fails
18fb95dd
kv-cache : restore find_slot impl
cbe971ae
kv-cache : add comments
1b4fbc8f
kv-cache : add bounds checks for sequence id
8bf7fec0
cont : add n_seq_max to batch allocr
91751ead
kv-cache : perform stream copies lazily after llama_synchronize
2d08a395
kv-cache : avoid throwing exceptions across the C boundary
69169b15
CUDA: 4D FlashAttention support (#14628)
886d3f15
ggerganov
force pushed
to
886d3f15
156 days ago
llama : rename attn_streams -> kv_unified
fb8150d8
slaren
approved these changes on 2025-07-16
common : rename kv_split -> kv_unified
318c4f8f
ggerganov
merged
225e7a14
into master
152 days ago
ggerganov
deleted the gg/llama-high-throughput branch
152 days ago
Login to write a write a comment.
Login via GitHub
Reviewers
slaren
compilade
JohannesGaessler
Assignees
No one assigned
Labels
testing
Nvidia GPU
examples
ggml
Apple Metal
hot
Milestone
No milestone
Login to write a write a comment.
Login via GitHub