llama.cpp
ggml-cuda: Add NVFP4 dp4a kernel
#20644
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
14
Changes
View On
GitHub
ggml-cuda: Add NVFP4 dp4a kernel
#20644
IMbackK
merged 14 commits into
ggml-org:master
from
michaelw9999:nvfp4-dp4a
michaelw9999
requested a review
18 days ago
github-actions
added
Nvidia GPU
github-actions
added
python
github-actions
added
ggml
JohannesGaessler
commented on 2026-03-16
michaelw9999
force pushed
from
6e5081b5
to
b581acea
13 days ago
michaelw9999
requested a review
from
CISC
13 days ago
michaelw9999
requested a review
from
ggerganov
13 days ago
michaelw9999
force pushed
from
b581acea
to
c0dce55b
13 days ago
michaelw9999
force pushed
from
c0dce55b
to
ce2e0602
13 days ago
michaelw9999
force pushed
from
9decc622
to
cd4f809c
12 days ago
JohannesGaessler
requested changes on 2026-03-22
michaelw9999
force pushed
from
037d135d
to
1d9aa514
12 days ago
michaelw9999
force pushed
from
4e7736d5
to
f8b338e0
11 days ago
am17an
commented on 2026-03-16
Forced F32 path for NVFP4/Cublas and removed Fusion/TensorScale
fa79ea63
michaelw9999
force pushed
from
24de2119
to
fa79ea63
11 days ago
Removed stale code
53450f12
Renamed k to ne
7fd898be
Added check for dst_t to cuda_cast template for float
caa8fba0
am17an
commented on 2026-03-24
am17an
commented on 2026-03-24
am17an
commented on 2026-03-24
Restored ggml_cuda_ue4m3_to_fp32, changed vecdot ints to int32ts
55acc41c
Simplified ggml_cuda_ue4m3_to_fp32
5a7e19b4
JohannesGaessler
commented on 2026-03-25
Removed NVFP4-MMQ block checks
82f0e6bb
Added CUDART/HIP Check and HIP/fp8 include
daf439b1
michaelw9999
requested a review
from
IMbackK
10 days ago
Added NVFP4 to Test-backend-ops
9ab7cf21
Added hip_fp8_e4m3 to __nv_fp8_e4m3 typedef
e30f0b3c
Restored last include to baseline
e728b2a4
Removed whitespace artifacts
8c8f368a
github-actions
added
testing
JohannesGaessler
approved these changes on 2026-03-25
JohannesGaessler
requested a review
from
am17an
9 days ago
Update ggml/src/ggml-cuda/ggml-cuda.cu
0780545b
am17an
approved these changes on 2026-03-25
try CI fix
af41687c
am17an
approved these changes on 2026-03-26
IMbackK
approved these changes on 2026-03-26
IMbackK
merged
112c7815
into master
9 days ago
michaelw9999
deleted the nvfp4-dp4a branch
8 days ago
Login to write a write a comment.
Login via GitHub
Reviewers
IMbackK
am17an
JohannesGaessler
CISC
ggerganov
Assignees
No one assigned
Labels
testing
Nvidia GPU
python
ggml
Milestone
No milestone
Login to write a write a comment.
Login via GitHub