llama.cpp
ggml-cuda : add TQ2_0 kernels, for ternary inference on GPU
#11183
Open
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
7
Changes
View On
GitHub
ggml-cuda : add TQ2_0 kernels, for ternary inference on GPU
#11183
compilade
wants to merge 7 commits into
master
from
compilade/cuda-tq2_0
ggml-cuda : add TQ2_0 support
970b5ab7
ggml-cuda : cleanup TQ2_0
fb43d5e8
Merge branch 'master' into compilade/cuda-tq2_0
983aa09b
ggml-cuda : remove some superfluous comments for TQ2_0 tile loading
f5fddb6d
compilade
added
enhancement
compilade
added
performance
compilade
added
Review Complexity : High
compilade
added
ggml
compilade
requested a review
from
JohannesGaessler
302 days ago
github-actions
added
testing
github-actions
added
Nvidia GPU
github-actions
added
python
compilade
commented on 2025-01-10
JohannesGaessler
commented on 2025-01-11
ggml-cuda : slight optimizations for TQ2_0
946796fc
ggml-metal : supports_op returns false for ternary types
b6fc9f03
ggml-cuda : use i and j instead of i0 and i in vec_dot_tq2_0_q8_1
fbddb262
github-actions
added
Apple Metal
Login to write a write a comment.
Login via GitHub
Reviewers
JohannesGaessler
Assignees
No one assigned
Labels
enhancement
performance
testing
Nvidia GPU
python
Review Complexity : High
ggml
Apple Metal
Milestone
No milestone
Login to write a write a comment.
Login via GitHub