llama.cpp
d54a4027
- CUDA: lower GPU latency + fix Windows performance (#3110)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
2 years ago
CUDA: lower GPU latency + fix Windows performance (#3110)
References
#3110 - CUDA: refactor ggml_cuda_op + lower GPU latency via quantization on main GPU and tiling
Author
JohannesGaessler
Parents
1b0d0925
Loading