llama.cpp
06a92a19
- server : fix cache reuse logic (#12161)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
218 days ago
server : fix cache reuse logic (#12161) The first kv shift offsets the positions of all tokens after head_c. When using llama_kv_cache_seq_rm next, using head_c will remove the valid tokens because their positions have already been offset.
References
#12161 - Server: Cache position calculation error(#12160)
Author
Clauszy
Parents
a057897a
Loading