PR #7676 CUDA: use tensor cores for MMQ

CUDA: use tensor cores for MMQ #7676

JohannesGaessler merged 4 commits into ggml-org:master from JohannesGaessler:cuda-ptx-mma-2

mofosyne added Review Complexity : High

github-actions added Nvidia GPU

github-actions added ggml

JohannesGaessler force pushed to bf10e133 1 year ago

JohannesGaessler marked this pull request as ready for review 1 year ago

CUDA: int8 tensor cores for MMQ (legacy quants)

bd89bb37

fix out-of-bounds writes

054d4ea9

JohannesGaessler force pushed to 054d4ea9 1 year ago

slaren approved these changes on 2024-06-10

__builtin_assume -> GGML_CUDA_ASSUME

a9cde5c6

fix writeback returning too early

a64a81a2

JohannesGaessler merged 1f0dabda into master 1 year ago

Reviewers

slaren

Assignees

No one assigned

Labels

Nvidia GPU Review Complexity : High ggml

Milestone

No milestone