PR #8062 CUDA: optimize MMQ int8 tensor core performance

CUDA: optimize MMQ int8 tensor core performance #8062

JohannesGaessler merged 3 commits into ggml-org:master from JohannesGaessler:cuda-mmq-2xa-3

github-actions added Nvidia GPU

github-actions added ggml

slaren commented on 2024-06-22

slaren approved these changes on 2024-06-24

JohannesGaessler force pushed from bc2cbd56 to 5714f000 1 year ago

CUDA: optimize MMQ int8 tensor core performance

db6dae79

only a single get_mma_tile_x_k function

cab59819

simplify code, make functions constexpr

5db21312

JohannesGaessler force pushed from 5714f000 to 5db21312 1 year ago

JohannesGaessler merged 9a590c82 into master 1 year ago

Reviewers

slaren

Assignees

No one assigned

Labels

Nvidia GPU ggml

Milestone

No milestone