llama.cpp
fb43d5e8 - ggml-cuda : cleanup TQ2_0

Commit
303 days ago
ggml-cuda : cleanup TQ2_0 This also removes custom TQ2_0 mmq dp4a, because re-using the one from Q8_0 allows avoiding to repeatedly unpack the 2-bit values to 8-bit and instead only do it once per tile.
Author
Parents
Loading