llama.cpp
CUDA: larger SRAM reads for tile FA, AMD FP16 dot
#15927
Merged

CUDA: larger SRAM reads for tile FA, AMD FP16 dot #15927

JohannesGaessler
JohannesGaessler CUDA: larger SRAM reads for tile FA, AMD FP16 dot
8821183a
github-actions github-actions added Nvidia GPU
github-actions github-actions added ggml
JohannesGaessler fix logic for availability of v_dot2_f32_f16
fe4eb4f8
JohannesGaessler JohannesGaessler force pushed from 4ff67318 to fe4eb4f8 8 days ago
slaren
slaren approved these changes on 2025-09-11
JohannesGaessler JohannesGaessler merged 0e6ff004 into master 8 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone