llama.cpp
fb43d5e8
- ggml-cuda : cleanup TQ2_0
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
303 days ago
ggml-cuda : cleanup TQ2_0 This also removes custom TQ2_0 mmq dp4a, because re-using the one from Q8_0 allows avoiding to repeatedly unpack the 2-bit values to 8-bit and instead only do it once per tile.
References
#11183 - ggml-cuda : add TQ2_0 kernels, for ternary inference on GPU
Author
compilade
Parents
970b5ab7
Loading