llama.cpp
ggml : add NVFP4 quantization type support
#19769
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
52
Changes
View On
GitHub
ggml : add NVFP4 quantization type support
#19769
CISC
merged 52 commits into
ggml-org:master
from
richarddd:feat/nvfp4
richarddd
requested a review
from
ggerganov
40 days ago
richarddd
requested a review
from
CISC
40 days ago
richarddd
requested a review
from
0cc4m
40 days ago
richarddd
requested a review
from
JohannesGaessler
40 days ago
richarddd
force pushed
from
9cd0f586
to
86dd3fc6
40 days ago
github-actions
added
testing
github-actions
added
Nvidia GPU
github-actions
added
Vulkan
github-actions
added
python
github-actions
added
ggml
github-actions
added
Apple Metal
richarddd
marked this pull request as draft
38 days ago
richarddd
force pushed
from
5f8f21bb
to
ffab58b2
38 days ago
github-actions
added
examples
richarddd
marked this pull request as ready for review
37 days ago
am17an
commented on 2026-02-25
am17an
commented on 2026-02-25
am17an
commented on 2026-02-25
CISC
commented on 2026-02-25
am17an
commented on 2026-02-26
richarddd
force pushed
from
3cbb4e37
to
fa669191
34 days ago
am17an
commented on 2026-03-01
CISC
commented on 2026-03-01
CISC
commented on 2026-03-01
richarddd
changed the title
WIP: ggml : add NVFP4 quantization type support
ggml : add NVFP4 quantization type support
30 days ago
ggerganov
commented on 2026-03-04
github-actions
added
model
WIP: add NVFP4 quantization support
98bf995c
tests
7138a3cc
improve NVFP4 dot product implementation performance and fix bad sup…
91fd8f7b
typo
c55390f8
Use nvfp4 kvalues
8a3b35f4
vulkan : fix NVFP4 shader compilation by including kvalues_mxfp4 look…
9a3d804c
vulcal and perf fixes
ad18a561
wip
270eba7a
Fix metal
03df285f
fix vulcan
457ee2b7
Rename threshold & fix wrong scale
2d91e235
Fix MOE
9936919b
Shelf backend implementations (CUDA, Metal, Vulkan, arch-specific SIMD)
4303f97d
Fix arch-fallback.h: add NVFP4 generic fallback for all platforms
39a3734f
quantize: add NVFP4 as a quantization type option
256d0b1b
Fix ggml_fp32_to_ue4m3: handle subnormal values
ddc93e50
Restore ARM NEON NVFP4 dot product implementation
1d291020
Optimize ARM NEON NVFP4 dot product: LUT + vpaddq + vfmaq
0d015ef5
ARM NEON NVFP4: rearrange q8 to match nibble layout
525c76a4
CPU only backend 64 super-block layout
707e088d
cleanup
4e4275d0
Remove unused LUT
48632879
int
3c3e662a
exclude NVFP4 from unsupported ops in metal build
93bbcadf
remove quantization for now
9ceb0025
store scales as native UE4M3, preserve original model bits when possible
4bfb3188
Update convert_hf_to_gguf.py
3b4ebe5b
correct comment
ad9d68a6
format
0717bfc4
reduce duplication and cleanup
bd4b67ce
Address comments
a3b0c749
move detection to prepare_tensors
396b8241
Use math instead of const
c21e1df8
Move
5cc15166
fix comment
f2b6dce1
Shelf quantize tests
a567d3ed
Rebase and move check
677eedb0
cleanup
7dead730
lint
e400ac78
Update gguf-py/gguf/scripts/gguf_convert_endian.py
52b25d3e
cleanup
7dead730
lint
e400ac78
organize
b005f559
Refactor
db618aa7
richarddd
force pushed
from
93ab4d7a
to
db618aa7
24 days ago
ggerganov
approved these changes on 2026-03-09
ggerganov
requested a review
from
CISC
24 days ago
CISC
approved these changes on 2026-03-09
Update convert_hf_to_gguf.py
2b465a6f
Update convert_hf_to_gguf.py
27c28316
Update convert_hf_to_gguf.py
d2b9d373
add quantize_nvfp4 (required for test_quants.py)
0870ec53
add quantize_nvfp4 (required for test_quants.py)
238a9125
add quantize_nvfp4 (required for test_quants.py)
215787e0
fix return type
dab2f826
ORippler
dismissed these changes on 2026-03-10
Merge branch 'master' into feat/nvfp4
51f757c2
CISC
dismissed their stale review
21 days ago
https://github.com/ggml-org/llama.cpp/pull/19769#issuecomment-4040742527
CISC
merged
5eae9cb1
into master
21 days ago
richarddd
deleted the feat/nvfp4 branch
21 days ago
Login to write a write a comment.
Login via GitHub
Reviewers
CISC
ggerganov
ORippler
pwilkin
am17an
mishig25
compilade
0cc4m
JohannesGaessler
Assignees
No one assigned
Labels
model
testing
Nvidia GPU
Vulkan
examples
python
ggml
Apple Metal
Milestone
No milestone