llama.cpp
CUDA: optimize MMQ int8 tensor core performance
#8062
Merged

CUDA: optimize MMQ int8 tensor core performance #8062

JohannesGaessler
github-actions github-actions added Nvidia GPU
github-actions github-actions added ggml
slaren
slaren commented on 2024-06-22
slaren
slaren commented on 2024-06-22
slaren
slaren approved these changes on 2024-06-24
JohannesGaessler JohannesGaessler force pushed from bc2cbd56 to 5714f000 1 year ago
JohannesGaessler CUDA: optimize MMQ int8 tensor core performance
db6dae79
JohannesGaessler only a single get_mma_tile_x_k function
cab59819
JohannesGaessler simplify code, make functions constexpr
5db21312
JohannesGaessler JohannesGaessler force pushed from 5714f000 to 5db21312 1 year ago
JohannesGaessler JohannesGaessler merged 9a590c82 into master 1 year ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone