llama.cpp
91544948
- CUDA: mul_mat_id always on GPU for batches >= 32 (#4553)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Hide Minimap (CTRL+M)
Commit
1 year ago
CUDA: mul_mat_id always on GPU for batches >= 32 (#4553)
References
#4553 - CUDA: faster Mixtral prompt processing for partial offloading
Author
JohannesGaessler
Parents
c083718c
Files
1
ggml-cuda.cu
Loading