llama.cpp
llama-server : implement universal assisted decoding
#12635
Merged

Commits
  • llama-server : implement universal assisted decoding
    g2mt committed 1 year ago
  • Merge branch 'master' into master
    g2mt committed 1 year ago
  • Merge remote-tracking branch 'fork/master' into universal-decoding
    g2mt committed 330 days ago
  • Erase prompt tail for kv-cache
    g2mt committed 330 days ago
  • set vocab_dft_compatible in common_speculative
    g2mt committed 330 days ago
  • rename ctx_main to ctx_tgt
    g2mt committed 330 days ago
  • move vocab_dft_compatible to spec struct
    g2mt committed 330 days ago
  • clear mem_dft, remove mem
    g2mt committed 330 days ago
  • detokenize id_last for incompatible models
    g2mt committed 330 days ago
  • update comment
    g2mt committed 330 days ago
  • add --spec-replace flag
    g2mt committed 330 days ago
  • accept special tokens when translating between draft/main models
    g2mt committed 330 days ago
  • Merge remote-tracking branch 'upstream/master'
    g2mt committed 330 days ago
  • Escape spec-replace
    g2mt committed 330 days ago
  • Merge branch 'ggml-org:master' into master
    g2mt committed 326 days ago
  • clamp draft result to size to params.n_draft
    g2mt committed 326 days ago
  • Merge branch 'ggml-org:master' into master
    g2mt committed 320 days ago
  • Merge branch 'ggml-org:master' into master
    g2mt committed 314 days ago
  • Merge branch 'ggml-org:master' into master
    g2mt committed 300 days ago
  • fix comment
    g2mt committed 300 days ago
  • clean up code
    g2mt committed 299 days ago
  • restore old example
    g2mt committed 299 days ago
  • log common_speculative_are_compatible in speculative example
    g2mt committed 299 days ago
  • fix
    g2mt committed 299 days ago
  • Update common/speculative.cpp
    g2mt committed 298 days ago
  • Update common/speculative.cpp
    g2mt committed 298 days ago
  • Update common/speculative.cpp
    g2mt committed 298 days ago
Loading