llama.cpp
958367bf - server : refactor slot input data, move tokenizer to HTTP thread (#10023)

Commit
306 days ago
server : refactor slot input data, move tokenizer to HTTP thread (#10023) * server : refactor slot input data, move tokenizer to HTTP thread * move prompt_tokens.empty() check * fix incorrect if branch * fix infinite generation loop * bring back infill validation * add infill test * try fixing format_infill * fix test * remove redundant code * rename completion to inference * update docs * use llama_tokens everywhere
Author
Parents
  • examples/server
    • File
      README.md
    • File
      server.cpp
    • tests/features
      • File
        infill.feature
      • steps
        • File
          steps.py
    • File
      utils.hpp