llama.cpp
CUDA: use tensor cores for MMQ
#7676
Merged

CUDA: use tensor cores for MMQ #7676

JohannesGaessler
mofosyne mofosyne added Review Complexity : High
github-actions github-actions added Nvidia GPU
github-actions github-actions added ggml
slaren
JohannesGaessler JohannesGaessler force pushed to bf10e133 1 year ago
JohannesGaessler JohannesGaessler marked this pull request as ready for review 1 year ago
JohannesGaessler
github-actions
JohannesGaessler
JohannesGaessler CUDA: int8 tensor cores for MMQ (legacy quants)
bd89bb37
JohannesGaessler fix out-of-bounds writes
054d4ea9
JohannesGaessler JohannesGaessler force pushed to 054d4ea9 1 year ago
slaren
slaren approved these changes on 2024-06-10
slaren
ggerganov
JohannesGaessler
ggerganov
JohannesGaessler __builtin_assume -> GGML_CUDA_ASSUME
a9cde5c6
JohannesGaessler fix writeback returning too early
a64a81a2
JohannesGaessler
JohannesGaessler JohannesGaessler merged 1f0dabda into master 1 year ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone