llama.cpp
CUDA: faster q2_K, q3_K MMQ + int8 tensor cores
#7921
Merged

CUDA: faster q2_K, q3_K MMQ + int8 tensor cores #7921

JohannesGaessler
JohannesGaessler CUDA: faster q2_K, q3_K MMQ + int8 tensor cores
d962a56b
github-actions github-actions added Nvidia GPU
github-actions github-actions added ggml
JohannesGaessler JohannesGaessler added Review Complexity : High
JohannesGaessler try CI fix
87099452
JohannesGaessler try CI fix
46b4054e
JohannesGaessler try CI fix
80ba2aef
slaren
JohannesGaessler
slaren
JohannesGaessler
slaren
ggerganov
JohannesGaessler fix data race
bff3a209
JohannesGaessler
slaren
JohannesGaessler
slaren
JohannesGaessler
JohannesGaessler rever q2_K precision related changes
1d9dd480
JohannesGaessler
JohannesGaessler
slaren
slaren
slaren approved these changes on 2024-06-14
JohannesGaessler JohannesGaessler merged 76d66ee0 into master 1 year ago
bartowski1182
JohannesGaessler

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone