llama-server : implement universal assisted decoding #12635
llama-server : implement universal assisted decoding
6f962699
Merge branch 'master' into master
6f74c9c4
Merge remote-tracking branch 'fork/master' into universal-decoding
e6676458
Erase prompt tail for kv-cache
ff9e0623
set vocab_dft_compatible in common_speculative
39ca594a
rename ctx_main to ctx_tgt
eb424dd6
move vocab_dft_compatible to spec struct
2550f11f
clear mem_dft, remove mem
3c35c9d9
detokenize id_last for incompatible models
12751c9d
update comment
84199317
add --spec-replace flag
b9fdf203
accept special tokens when translating between draft/main models
160769de
Merge remote-tracking branch 'upstream/master'
ebaa82ec
g2mt
closed this 329 days ago
g2mt
reopened this 329 days ago
g2mt
marked this pull request as draft 329 days ago
g2mt
marked this pull request as ready for review 329 days ago
Escape spec-replace
d1f32aba
Merge branch 'ggml-org:master' into master
3afb5567
clamp draft result to size to params.n_draft
d23892ec
Merge branch 'ggml-org:master' into master
c382c281
CISC
removed documentation
CISC
removed testing
CISC
removed android
CISC
removed Nvidia GPU
CISC
removed Apple Metal
CISC
removed Ascend NPU
Merge branch 'ggml-org:master' into master
e14bafb4
Merge branch 'ggml-org:master' into master
f8cee4e0
fix comment
2cc9e2e1
clean up code
829b7624
CISC
approved these changes
on 2025-07-29
CISC
removed review request
from
ngxson
298 days ago
g2mt
force pushed
to
829b7624
298 days ago
restore old example
b045eac6
g2mt
force pushed
to
79d2be41
298 days ago
log common_speculative_are_compatible in speculative example
79d2be41
fix
6acc6814
Update common/speculative.cpp
50908f29
Update common/speculative.cpp
e866f230
Update common/speculative.cpp
24cede7e
CISC
merged
94933c8c
into master 297 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub