llama.cpp
d924522a - Custom RoPE + bettter memory management for CUDA (#2295)

Commit
2 years ago
Custom RoPE + bettter memory management for CUDA (#2295) * Custom RoPE + bettter memory management for CUDA * Adjusted look ahead in ggml_cuda_pool_malloc to 5% This is sufficient it seems. We end up using about 200 MB less VRAM that way when running the 13B model with context 8192. --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Author
Parents
Loading