llama.cpp
ggml-cuda: Add generic NVFP4 MMQ kernel
#21074
Open

ggml-cuda: Add generic NVFP4 MMQ kernel #21074

michaelw9999 wants to merge 19 commits into ggml-org:master from michaelw9999:nvfp4-mmq-mma
michaelw9999
michaelw9999 michaelw9999 requested a review 4 days ago
github-actions github-actions added Nvidia GPU
github-actions github-actions added ggml
michaelw9999 Introduced NVFP4 generic MMQ kernel
94e58bec
michaelw9999 Added extra FP8 guard, hope to solve ci HIP failure
2761dcaf
michaelw9999 michaelw9999 force pushed from 9fd81b07 to 2761dcaf 4 days ago
am17an
michaelw9999
IMbackK
am17an
michaelw9999
am17an
am17an commented on 2026-03-28
michaelw9999
xkmire
michaelw9999
michaelw9999 michaelw9999 requested a review from IMbackK IMbackK 3 days ago
michaelw9999 Rename tiles and use HIP_FP8_AVAILABLE
cbd9fba6
michaelw9999 michaelw9999 force pushed from 29b4a8d6 to cbd9fba6 3 days ago
am17an
am17an commented on 2026-03-28
am17an
am17an commented on 2026-03-28
michaelw9999 Removed remaning FP8 straggler and added const int
0d9292cf
michaelw9999 Const
0018ce86
am17an
am17an commented on 2026-03-28
michaelw9999 Removed DECL_MMQ_CASE artifact
592e18cc
am17an
am17an commented on 2026-03-28
michaelw9999 Removed newline
1489ea54
michaelw9999 Removed space after else
3177030d
IMbackK
IMbackK commented on 2026-03-28
michaelw9999 Changed HIP FP8 NVFP4 conversion gate
ebe28e97
michaelw9999 Added new line to bottom of mmq.cu 270
aa55cb35
michaelw9999 Removed extra spaces
8af43252
michaelw9999 Removed single space in front of else on line 814
d8c5b7b6
michaelw9999 Added NVFP4 to generate cu script so HIP can see it, further tightene…
cba8605e
github-actions github-actions added python
michaelw9999
am17an
michaelw9999 Include generated mmq-instance-nvfp4.cu
a2f724da
michaelw9999
michaelw9999
am17an
am17an approved these changes on 2026-03-29
IMbackK
michaelw9999 Added NVFP4 mmq to HIP Check ignore list
4be4b92b
github-actions github-actions added script
michaelw9999
am17an
JohannesGaessler
JohannesGaessler commented on 2026-03-30
michaelw9999 Update ggml/src/ggml-cuda/mmq.cuh
30d7c8c1
michaelw9999 Update ggml/src/ggml-cuda/mmq.cuh
145d8f18
michaelw9999 Update ggml/src/ggml-cuda/mmq.cuh
bf496f6a
michaelw9999 Added function names to closing endif
e2babc37
michaelw9999
JohannesGaessler
michaelw9999
IMbackK
drrros
michaelw9999

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone