llama.cpp
c5650ed4 - server : avoid context swaps by shifting the KV cache

Commit

2 years ago

server : avoid context swaps by shifting the KV cache

References

custom-attention-mask

#3228 - llama : custom attention mask + parallel decoding + no context swaps

Author

ggerganov

ggerganov

Parents

Loading