llama.cpp
ggml-cuda: Repost of 21896: Blackwell native NVFP4 support
#22196

Merged

ggml-cuda: Repost of 21896: Blackwell native NVFP4 support #22196

am17an merged 23 commits into ggml-org:master from michaelw9999:nvfp4-blackwell

Blackwell NVFP4 MMQ Kernel

a0818450

Removed whitespace

9fb7e840

Added FP8 Max definition and description

0bcf7b29

Fixed 'f' typo

4625a7cc

Removed whitespace from comment

3ea6b59d

Guard Blackwell NVFP4 quantizer for Blackwell only

db5957e7

Merged vec_dot_fp4_fp4_mma together

83b412f0

Refactored to use 76-byte MMQ_MMA_TILE_X_K_FP4 and block_fp4_mmq inst…

c3188065

Updated block_fp4_mmq packing comment

78596bfa

Added assert for QK_K == 8 * QK_MXFP4 in mul_mat_q

a68327c7

Removed extra space typo

6e31a22b

Changed NVFP4 quant assert and using get_int_b4

58e277e4

Removed bool has_ids template from quantize

0e2c7948

Updated block_fp4_mmq packing comment

72fc0170

Added ue4m3 bounds check for testscale

7fcc8c07

Removed whitespace on line 52 of mmq.cuh

7c73198d

Fixed MMQ_ITER_K_FP4 returning on non-FP4 models when running on Blac…

6b26a1c7

Change GGML_ASSERT to static_assert

e34b6ff6

Whitespace fixes

02df2638

Change amax_raw mul 1/6 to: / 6

92045908

Hoisted kbx0 and kbx out of the loop

667cc38d

Update ggml/src/ggml-cuda/mmq.cuh

553c3a85

Add endif blackwell mma comment

0d9e0458

michaelw9999 requested a review from

ggerganov 58 days ago

michaelw9999 requested a review 58 days ago

github-actions added testing

github-actions added Nvidia GPU

github-actions added ggml

michaelw9999 marked this pull request as draft 52 days ago

am17an approved these changes on 2026-04-28

JohannesGaessler approved these changes on 2026-04-28

michaelw9999 marked this pull request as ready for review 51 days ago

am17an merged fc2b0053 into master 50 days ago

Reviewers

JohannesGaessler

am17an

ggerganov

Assignees

No one assigned

Labels

testing Nvidia GPU ggml

Milestone

No milestone

llama.cpp ggml-cuda: Repost of 21896: Blackwell native NVFP4 support #22196 Merged

ggml-cuda: Repost of 21896: Blackwell native NVFP4 support #22196

llama.cpp
ggml-cuda: Repost of 21896: Blackwell native NVFP4 support
#22196

Merged