llama.cpp
85a7d867 - memory : remove KV cache size padding (#16812)

Commit

8 days ago

memory : remove KV cache size padding (#16812) * memory : remove KV cache size padding * cont : restore padding for n_kv tensor shape * server : use slot context size instead of training context size * server : simplify context limit logic

References

#16812 - memory : remove KV cache size padding

Author

ggerganov

Parents

a8ca18b4

llama.cpp 85a7d867 - memory : remove KV cache size padding (#16812)

llama.cpp
85a7d867 - memory : remove KV cache size padding (#16812)