llama.cpp
CUDA: faster q2_K, q3_K MMQ + int8 tensor cores
#7921
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
6
Changes
View On
GitHub
CUDA: faster q2_K, q3_K MMQ + int8 tensor cores
#7921
JohannesGaessler
merged 6 commits into
ggml-org:master
from
JohannesGaessler:cuda-ptx-mma-17
CUDA: faster q2_K, q3_K MMQ + int8 tensor cores
d962a56b
github-actions
added
Nvidia GPU
github-actions
added
ggml
JohannesGaessler
added
Review Complexity : High
try CI fix
87099452
try CI fix
46b4054e
try CI fix
80ba2aef
fix data race
bff3a209
rever q2_K precision related changes
1d9dd480
slaren
approved these changes on 2024-06-14
JohannesGaessler
merged
76d66ee0
into master
1 year ago
Login to write a write a comment.
Login via GitHub
Reviewers
slaren
Assignees
No one assigned
Labels
Nvidia GPU
Review Complexity : High
ggml
Milestone
No milestone
Login to write a write a comment.
Login via GitHub