llama.cpp
CUDA: int8 tensor cores for MMQ (q4_K, q5_K, q6_K)
#7860

Merged

CUDA: int8 tensor cores for MMQ (q4_K, q5_K, q6_K) #7860

JohannesGaessler merged 1 commit into ggml-org:master from JohannesGaessler:cuda-ptx-mma-12

JohannesGaessler force pushed from 48ecafb2 to dc0ef0c4 1 year ago

slaren commented on 2024-06-10

CUDA: int8 tensor cores for MMQ (q4_K, q5_K, q6_K)

8cb2dbd1

github-actions added Nvidia GPU

github-actions added ggml

JohannesGaessler force pushed from dc0ef0c4 to 8cb2dbd1 1 year ago

slaren approved these changes on 2024-06-10

mofosyne added Review Complexity : High

JohannesGaessler merged bdcb8f42 into master 1 year ago

Reviewers

slaren

Assignees

No one assigned

Labels

Nvidia GPU Review Complexity : High ggml

Milestone

No milestone