llama.cpp
spec : parallel drafting support
#22838
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
36
Changes
View On
GitHub
spec : parallel drafting support
#22838
ggerganov
merged 36 commits into
master
from
gg/spec-refactor-parallel
spec : refactor
2c9a4084
spec : drop support for incompatible vocabs
befc7ef6
spec : update common_speculative_init()
4550f0f0
cont : pass seq_id
77269ad8
cont : dedup ctx_seq_rm_type
8a50f6f0
server : sketch the ctx_dft decode loop
c97dc360
server : draft prompt cache and checkpoints
11fd5e72
server : improve ctx names
1afee5b2
server, spec : transition to unified spec context
de35b125
cont : sync main and drft contexts
08c8012b
cont : async drft eval when possible
c7facb0f
cont : handle non-ckpt models
0239f4c6
cont : pass correct n_past for drafting
ae6703fa
cont : process images throught the draft context
7e118cdc
spec : handle draft running out of context
8be14e40
server : fix mtmd draft processing
6a4b05a0
server : fix URL for draft model
12c7cfbe
server : add comment
233d1aee
server : clean-up + dry
3b1a8df8
speculative-simple : update
e5b14013
spec : fix n_past type
161eae0a
server : fix slot ctx_drft ptr
1dbc054d
tools : update readme
778f9e24
naming : improve consistency
efa2f8e5
spec : refactor for multi-sequence speculative context
6582523e
ggerganov
changed the title
spec : refactor for multi-sequence speculative context
spec : parallel drafting support
41 days ago
cont : prepare params
8822c122
cont : prepare params
927d6635
spec : support parallel drafts
f88c9428
server : support parallel drafting
f1652197
github-actions
added
examples
github-actions
added
server
llama : reuse device buffers when possible
55b62bce
ggerganov
marked this pull request as ready for review
40 days ago
ggerganov
requested a review
40 days ago
ggerganov
requested a review
40 days ago
ServeurpersoCom
dismissed these changes on 2026-05-09
server, spec : clean-up
ce0acf03
cont : clean-up
b3bd3bd4
cont : minor
ec8bc448
spec : reset `drafting` flag at the end
0d5dd61d
spec : introduce `common_speculative_process()`
db8e3269
spec : allow for multiple spec types (chain of speculators)
51b5249e
ggerganov
changed the base branch from
gg/spec-refactor-ctx
to
master
38 days ago
ggerganov
dismissed their stale review
38 days ago
The base branch was changed.
ggerganov
requested a review
from
ngxson
38 days ago
ggerganov
merged
68e7ea3e
into master
38 days ago
ggerganov
deleted the gg/spec-refactor-parallel branch
38 days ago
Login to write a write a comment.
Login via GitHub
Reviewers
ServeurpersoCom
ngxson
Assignees
No one assigned
Labels
examples
server
Milestone
No milestone