llama.cpp
ggml-cuda : add TQ2_0 kernels, for ternary inference on GPU
#11183

Open

ggml-cuda : add TQ2_0 kernels, for ternary inference on GPU #11183

compilade wants to merge 7 commits into master from compilade/cuda-tq2_0

ggml-cuda : add TQ2_0 support

970b5ab7

ggml-cuda : cleanup TQ2_0

fb43d5e8

Merge branch 'master' into compilade/cuda-tq2_0

983aa09b

ggml-cuda : remove some superfluous comments for TQ2_0 tile loading

f5fddb6d

compilade added enhancement

compilade added performance

compilade added Review Complexity : High

compilade added ggml

compilade requested a review from

JohannesGaessler 302 days ago

github-actions added testing

github-actions added Nvidia GPU

github-actions added python

compilade commented on 2025-01-10

JohannesGaessler commented on 2025-01-11

ggml-cuda : slight optimizations for TQ2_0

946796fc

ggml-metal : supports_op returns false for ternary types

b6fc9f03

ggml-cuda : use i and j instead of i0 and i in vec_dot_tq2_0_q8_1

fbddb262

github-actions added Apple Metal

Reviewers

JohannesGaessler

Assignees

No one assigned

Labels

enhancement performance testing Nvidia GPU python Review Complexity : High ggml Apple Metal

Milestone

No milestone