ggml-cuda: Repost of 21896: Blackwell native NVFP4 support #22196
Blackwell NVFP4 MMQ Kernel
a0818450
Removed whitespace
9fb7e840
Added FP8 Max definition and description
0bcf7b29
Fixed 'f' typo
4625a7cc
Removed whitespace from comment
3ea6b59d
Guard Blackwell NVFP4 quantizer for Blackwell only
db5957e7
Merged vec_dot_fp4_fp4_mma together
83b412f0
Refactored to use 76-byte MMQ_MMA_TILE_X_K_FP4 and block_fp4_mmq inst…
c3188065
Updated block_fp4_mmq packing comment
78596bfa
Added assert for QK_K == 8 * QK_MXFP4 in mul_mat_q
a68327c7
Removed extra space typo
6e31a22b
Changed NVFP4 quant assert and using get_int_b4
58e277e4
Removed bool has_ids template from quantize
0e2c7948
Updated block_fp4_mmq packing comment
72fc0170
Added ue4m3 bounds check for testscale
7fcc8c07
Removed whitespace on line 52 of mmq.cuh
7c73198d
Fixed MMQ_ITER_K_FP4 returning on non-FP4 models when running on Blac…
6b26a1c7
Change GGML_ASSERT to static_assert
e34b6ff6
Whitespace fixes
02df2638
Change amax_raw mul 1/6 to: / 6
92045908
Hoisted kbx0 and kbx out of the loop
667cc38d
Update ggml/src/ggml-cuda/mmq.cuh
553c3a85
Add endif blackwell mma comment
0d9e0458
am17an
approved these changes
on 2026-04-28
michaelw9999
marked this pull request as ready for review 51 days ago
am17an
merged
fc2b0053
into master 50 days ago
Assignees
No one assigned
Labels
testing
Nvidia GPU
ggml
Login to write a write a comment.
Login via GitHub