llama.cpp
ggml-cuda: Add NVFP4 dp4a kernel
#20644
Merged

ggml-cuda: Add NVFP4 dp4a kernel #20644

IMbackK merged 14 commits into ggml-org:master from michaelw9999:nvfp4-dp4a
michaelw9999
michaelw9999 michaelw9999 requested a review 18 days ago
github-actions github-actions added Nvidia GPU
github-actions github-actions added python
github-actions github-actions added ggml
JohannesGaessler
JohannesGaessler commented on 2026-03-16
michaelw9999
JohannesGaessler
michaelw9999
michaelw9999 michaelw9999 force pushed from 6e5081b5 to b581acea 13 days ago
michaelw9999 michaelw9999 requested a review from CISC CISC 13 days ago
michaelw9999 michaelw9999 requested a review from ggerganov ggerganov 13 days ago
michaelw9999 michaelw9999 force pushed from b581acea to c0dce55b 13 days ago
michaelw9999 michaelw9999 force pushed from c0dce55b to ce2e0602 13 days ago
michaelw9999 michaelw9999 force pushed from 9decc622 to cd4f809c 12 days ago
michaelw9999
JohannesGaessler
JohannesGaessler requested changes on 2026-03-22
michaelw9999 michaelw9999 force pushed from 037d135d to 1d9aa514 12 days ago
am17an
michaelw9999
am17an
michaelw9999
am17an
xkmire
michaelw9999
michaelw9999 michaelw9999 force pushed from 4e7736d5 to f8b338e0 11 days ago
am17an
am17an commented on 2026-03-16
michaelw9999 Forced F32 path for NVFP4/Cublas and removed Fusion/TensorScale
fa79ea63
michaelw9999 michaelw9999 force pushed from 24de2119 to fa79ea63 11 days ago
michaelw9999 Removed stale code
53450f12
michaelw9999 Renamed k to ne
7fd898be
michaelw9999 Added check for dst_t to cuda_cast template for float
caa8fba0
am17an
am17an commented on 2026-03-24
am17an
am17an commented on 2026-03-24
am17an
am17an commented on 2026-03-24
michaelw9999 Restored ggml_cuda_ue4m3_to_fp32, changed vecdot ints to int32ts
55acc41c
JohannesGaessler
michaelw9999 Simplified ggml_cuda_ue4m3_to_fp32
5a7e19b4
michaelw9999
JohannesGaessler
JohannesGaessler commented on 2026-03-25
michaelw9999
michaelw9999 Removed NVFP4-MMQ block checks
82f0e6bb
michaelw9999 Added CUDART/HIP Check and HIP/fp8 include
daf439b1
michaelw9999 michaelw9999 requested a review from IMbackK IMbackK 10 days ago
michaelw9999 Added NVFP4 to Test-backend-ops
9ab7cf21
michaelw9999 Added hip_fp8_e4m3 to __nv_fp8_e4m3 typedef
e30f0b3c
michaelw9999
michaelw9999 Restored last include to baseline
e728b2a4
michaelw9999 Removed whitespace artifacts
8c8f368a
github-actions github-actions added testing
JohannesGaessler
JohannesGaessler approved these changes on 2026-03-25
JohannesGaessler JohannesGaessler requested a review from am17an am17an 9 days ago
michaelw9999 Update ggml/src/ggml-cuda/ggml-cuda.cu
0780545b
am17an
am17an approved these changes on 2026-03-25
JohannesGaessler try CI fix
af41687c
am17an
am17an approved these changes on 2026-03-26
IMbackK
IMbackK approved these changes on 2026-03-26
IMbackK IMbackK merged 112c7815 into master 9 days ago
CISC
CISC
IMbackK
CISC
IMbackK
michaelw9999 michaelw9999 deleted the nvfp4-dp4a branch 8 days ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone