llama.cpp
llama-server : implement universal assisted decoding
#12635

Merged

Commits

llama-server : implement universal assisted decoding

g2mt committed 1 year ago
Merge branch 'master' into master

g2mt committed 1 year ago
Merge remote-tracking branch 'fork/master' into universal-decoding

g2mt committed 330 days ago
Erase prompt tail for kv-cache

g2mt committed 330 days ago
set vocab_dft_compatible in common_speculative

g2mt committed 330 days ago
rename ctx_main to ctx_tgt

g2mt committed 330 days ago
move vocab_dft_compatible to spec struct

g2mt committed 330 days ago
clear mem_dft, remove mem

g2mt committed 330 days ago
detokenize id_last for incompatible models

g2mt committed 330 days ago
update comment

g2mt committed 330 days ago
add --spec-replace flag

g2mt committed 330 days ago
accept special tokens when translating between draft/main models

g2mt committed 330 days ago
Merge remote-tracking branch 'upstream/master'

g2mt committed 330 days ago
Escape spec-replace

g2mt committed 330 days ago
Merge branch 'ggml-org:master' into master

g2mt committed 326 days ago
clamp draft result to size to params.n_draft

g2mt committed 326 days ago
Merge branch 'ggml-org:master' into master

g2mt committed 320 days ago
Merge branch 'ggml-org:master' into master

g2mt committed 314 days ago
Merge branch 'ggml-org:master' into master

g2mt committed 300 days ago
fix comment

g2mt committed 300 days ago
clean up code

g2mt committed 299 days ago
restore old example

g2mt committed 299 days ago
log common_speculative_are_compatible in speculative example

g2mt committed 299 days ago
fix

g2mt committed 299 days ago
Update common/speculative.cpp

g2mt committed 298 days ago
Update common/speculative.cpp

g2mt committed 298 days ago
Update common/speculative.cpp

g2mt committed 298 days ago

llama.cpp llama-server : implement universal assisted decoding #12635 Merged

llama.cpp
llama-server : implement universal assisted decoding
#12635

Merged