llama.cpp
1123f7fb
- ggml-cuda : use graph allocator (#2684)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
2 years ago
ggml-cuda : use graph allocator (#2684) use a different function for no_alloc to avoid breaking backwards compat, fixes lora remove 512 n_batch limit fixed 2048 batch size cleanup Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
References
#2684 - ggml-cuda : use graph allocator
Author
slaren
Parents
ef3f333d
Loading