llama.cpp
CUDA: fix Pascal FA, deq. KV to FP16 for batch > 8
#7681

Merged

CUDA: fix Pascal FA, deq. KV to FP16 for batch > 8 #7681

JohannesGaessler merged 1 commit into ggml-org:master from JohannesGaessler:cuda-kv-dequantize

github-actions added Nvidia GPU

github-actions added ggml

slaren approved these changes on 2024-06-01

CUDA: fix Pascal FA, deq. KV to FP16 for batch > 8

90021615

JohannesGaessler force pushed to 90021615 1 year ago

JohannesGaessler merged 750f60c0 into master 1 year ago

Reviewers

slaren

Assignees

No one assigned

Labels

Nvidia GPU ggml

Milestone

No milestone