llama.cpp
spec : parallel drafting support
#22838
Merged

spec : parallel drafting support #22838

ggerganov merged 36 commits into master from gg/spec-refactor-parallel
ggerganov
ggerganov spec : refactor
2c9a4084
ggerganov spec : drop support for incompatible vocabs
befc7ef6
ggerganov spec : update common_speculative_init()
4550f0f0
ggerganov cont : pass seq_id
77269ad8
ggerganov cont : dedup ctx_seq_rm_type
8a50f6f0
ggerganov server : sketch the ctx_dft decode loop
c97dc360
ggerganov server : draft prompt cache and checkpoints
11fd5e72
ggerganov server : improve ctx names
1afee5b2
ggerganov server, spec : transition to unified spec context
de35b125
ggerganov cont : sync main and drft contexts
08c8012b
ggerganov cont : async drft eval when possible
c7facb0f
ggerganov cont : handle non-ckpt models
0239f4c6
ggerganov cont : pass correct n_past for drafting
ae6703fa
ggerganov cont : process images throught the draft context
7e118cdc
ggerganov spec : handle draft running out of context
8be14e40
ggerganov server : fix mtmd draft processing
6a4b05a0
ggerganov server : fix URL for draft model
12c7cfbe
ggerganov server : add comment
233d1aee
ggerganov server : clean-up + dry
3b1a8df8
ggerganov speculative-simple : update
e5b14013
ggerganov spec : fix n_past type
161eae0a
ggerganov server : fix slot ctx_drft ptr
1dbc054d
ggerganov tools : update readme
778f9e24
ggerganov naming : improve consistency
efa2f8e5
ggerganov spec : refactor for multi-sequence speculative context
6582523e
ggerganov ggerganov changed the title spec : refactor for multi-sequence speculative context spec : parallel drafting support 41 days ago
ggerganov cont : prepare params
8822c122
ggerganov cont : prepare params
927d6635
ggerganov spec : support parallel drafts
f88c9428
ggerganov server : support parallel drafting
f1652197
github-actions github-actions added examples
github-actions github-actions added server
ggerganov llama : reuse device buffers when possible
55b62bce
ggerganov ggerganov marked this pull request as ready for review 40 days ago
ggerganov ggerganov requested a review 40 days ago
ggerganov ggerganov requested a review 40 days ago
ServeurpersoCom
ServeurpersoCom dismissed these changes on 2026-05-09
ggerganov server, spec : clean-up
ce0acf03
ggerganov cont : clean-up
b3bd3bd4
ggerganov cont : minor
ec8bc448
ggerganov spec : reset `drafting` flag at the end
0d5dd61d
ggerganov spec : introduce `common_speculative_process()`
db8e3269
ggerganov
ggerganov
am17an
ggerganov
am17an
ruixiang63
spec : allow for multiple spec types (chain of speculators)
51b5249e
ggerganov
ggerganov
petersid2022
ggerganov
ggerganov ggerganov changed the base branch from gg/spec-refactor-ctx to master 38 days ago
ggerganov ggerganov dismissed their stale review 38 days ago
The base branch was changed.
ggerganov ggerganov requested a review from ngxson ngxson 38 days ago
ggerganov ggerganov merged 68e7ea3e into master 38 days ago
ggerganov ggerganov deleted the gg/spec-refactor-parallel branch 38 days ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone