llama.cpp
806d397c - parallel : try smaller batches when the KV cache is fragmented

Commit

2 years ago

parallel : try smaller batches when the KV cache is fragmented

References

#3228 - llama : custom attention mask + parallel decoding + no context swaps

Author

ggerganov

ggerganov

Parents

Loading