llama.cpp
c1596f63 - llama : fix kv cache heuristic when context is less than 32

Commit

2 years ago

llama : fix kv cache heuristic when context is less than 32

References

#3228 - llama : custom attention mask + parallel decoding + no context swaps

Author

ggerganov

ggerganov

Parents

Loading