llama.cpp
ggml-cuda : add TQ2_0 kernels, for ternary inference on GPU
#11183

Open

Commits

ggml-cuda : add TQ2_0 support

compilade committed 1 year ago
ggml-cuda : cleanup TQ2_0

compilade committed 1 year ago
Merge branch 'master' into compilade/cuda-tq2_0

compilade committed 1 year ago
ggml-cuda : remove some superfluous comments for TQ2_0 tile loading

compilade committed 1 year ago
ggml-cuda : slight optimizations for TQ2_0

compilade committed 1 year ago
ggml-metal : supports_op returns false for ternary types

compilade committed 1 year ago
ggml-cuda : use i and j instead of i0 and i in vec_dot_tq2_0_q8_1

compilade committed 1 year ago