llama.cpp
ggml-cuda : add TQ2_0 kernels, for ternary inference on GPU
#11183
Open

ggml-cuda : add TQ2_0 kernels, for ternary inference on GPU #11183

compilade wants to merge 7 commits into master from compilade/cuda-tq2_0
compilade
compilade ggml-cuda : add TQ2_0 support
970b5ab7
compilade ggml-cuda : cleanup TQ2_0
fb43d5e8
compilade Merge branch 'master' into compilade/cuda-tq2_0
983aa09b
compilade ggml-cuda : remove some superfluous comments for TQ2_0 tile loading
f5fddb6d
compilade compilade added enhancement
compilade compilade added performance
compilade compilade added Review Complexity : High
compilade compilade added ggml
compilade compilade requested a review from JohannesGaessler JohannesGaessler 302 days ago
github-actions github-actions added testing
github-actions github-actions added Nvidia GPU
github-actions github-actions added python
compilade
compilade commented on 2025-01-10
JohannesGaessler
JohannesGaessler commented on 2025-01-11
compilade ggml-cuda : slight optimizations for TQ2_0
946796fc
compilade ggml-metal : supports_op returns false for ternary types
b6fc9f03
compilade ggml-cuda : use i and j instead of i0 and i in vec_dot_tq2_0_q8_1
fbddb262
github-actions github-actions added Apple Metal
JohannesGaessler
compilade
JohannesGaessler
BarfingLemurs
compilade

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone