llama.cpp
06a92a19 - server : fix cache reuse logic (#12161)

Commit
218 days ago
server : fix cache reuse logic (#12161) The first kv shift offsets the positions of all tokens after head_c. When using llama_kv_cache_seq_rm next, using head_c will remove the valid tokens because their positions have already been offset.
Author
Parents
Loading