llama.cpp
ggml-cuda : add TQ2_0 kernels, for ternary inference on GPU
#11183
Open

Commits
  • ggml-cuda : add TQ2_0 support
    compilade committed 1 year ago
  • ggml-cuda : cleanup TQ2_0
    compilade committed 1 year ago
  • Merge branch 'master' into compilade/cuda-tq2_0
    compilade committed 1 year ago
  • ggml-cuda : remove some superfluous comments for TQ2_0 tile loading
    compilade committed 1 year ago
  • ggml-cuda : slight optimizations for TQ2_0
    compilade committed 1 year ago
  • ggml-metal : supports_op returns false for ternary types
    compilade committed 1 year ago
  • ggml-cuda : use i and j instead of i0 and i in vec_dot_tq2_0_q8_1
    compilade committed 1 year ago
Loading