PR #21074 ggml-cuda: Add generic NVFP4 MMQ kernel

ggml-cuda: Add generic NVFP4 MMQ kernel #21074

michaelw9999 wants to merge 19 commits into ggml-org:master from michaelw9999:nvfp4-mmq-mma

michaelw9999 requested a review 4 days ago

github-actions added Nvidia GPU

github-actions added ggml

Introduced NVFP4 generic MMQ kernel

94e58bec

Added extra FP8 guard, hope to solve ci HIP failure

2761dcaf

michaelw9999 force pushed from 9fd81b07 to 2761dcaf 4 days ago

am17an commented on 2026-03-28

michaelw9999 requested a review from

IMbackK 3 days ago

Rename tiles and use HIP_FP8_AVAILABLE

cbd9fba6

michaelw9999 force pushed from 29b4a8d6 to cbd9fba6 3 days ago

am17an commented on 2026-03-28

Removed remaning FP8 straggler and added const int

0d9292cf

Const

0018ce86

am17an commented on 2026-03-28

Removed DECL_MMQ_CASE artifact

592e18cc

am17an commented on 2026-03-28

Removed newline

1489ea54

Removed space after else

3177030d

IMbackK commented on 2026-03-28

Changed HIP FP8 NVFP4 conversion gate

ebe28e97

Added new line to bottom of mmq.cu 270

aa55cb35

Removed extra spaces

8af43252

Removed single space in front of else on line 814

d8c5b7b6

Added NVFP4 to generate cu script so HIP can see it, further tightene…

cba8605e

github-actions added python

Include generated mmq-instance-nvfp4.cu

a2f724da

am17an approved these changes on 2026-03-29

Added NVFP4 mmq to HIP Check ignore list

4be4b92b

github-actions added script

JohannesGaessler commented on 2026-03-30

Update ggml/src/ggml-cuda/mmq.cuh

30d7c8c1

Update ggml/src/ggml-cuda/mmq.cuh

145d8f18

Update ggml/src/ggml-cuda/mmq.cuh

bf496f6a

Added function names to closing endif

e2babc37

Reviewers

am17an

JohannesGaessler

CISC

IMbackK

Assignees

No one assigned

Labels

script Nvidia GPU python ggml

Milestone

No milestone

llama.cpp ggml-cuda: Add generic NVFP4 MMQ kernel #21074 Open

ggml-cuda: Add generic NVFP4 MMQ kernel #21074

llama.cpp
ggml-cuda: Add generic NVFP4 MMQ kernel
#21074

Open