llama.cpp
ce2d995a - server : clear the KV cache beyond n_past before llama_decode

Commit

2 years ago

server : clear the KV cache beyond n_past before llama_decode

References

#3228 - llama : custom attention mask + parallel decoding + no context swaps

Author

ggerganov

ggerganov

Parents

Loading