PR #19769 ggml : add NVFP4 quantization type support

ggml : add NVFP4 quantization type support #19769

CISC merged 52 commits into ggml-org:master from richarddd:feat/nvfp4

richarddd requested a review from

ggerganov 40 days ago

richarddd requested a review from

CISC 40 days ago

richarddd requested a review from

0cc4m 40 days ago

richarddd requested a review from

JohannesGaessler 40 days ago

richarddd force pushed from 9cd0f586 to 86dd3fc6 40 days ago

github-actions added testing

github-actions added Nvidia GPU

github-actions added Vulkan

github-actions added python

github-actions added ggml

github-actions added Apple Metal

richarddd marked this pull request as draft 38 days ago

richarddd force pushed from 5f8f21bb to ffab58b2 38 days ago

github-actions added examples

richarddd marked this pull request as ready for review 37 days ago

am17an commented on 2026-02-25

CISC commented on 2026-02-25

am17an commented on 2026-02-26

richarddd force pushed from 3cbb4e37 to fa669191 34 days ago

am17an commented on 2026-03-01

CISC commented on 2026-03-01

richarddd changed the title ~~WIP: ggml : add NVFP4 quantization type support~~ ggml : add NVFP4 quantization type support 30 days ago

ggerganov commented on 2026-03-04

github-actions added model

WIP: add NVFP4 quantization support

98bf995c

tests

7138a3cc

improve NVFP4 dot product implementation performance and fix bad sup…

91fd8f7b

typo

c55390f8

Use nvfp4 kvalues

8a3b35f4

vulkan : fix NVFP4 shader compilation by including kvalues_mxfp4 look…

9a3d804c

vulcal and perf fixes

ad18a561

wip

270eba7a

Fix metal

03df285f

fix vulcan

457ee2b7

Rename threshold & fix wrong scale

2d91e235

Fix MOE

9936919b

Shelf backend implementations (CUDA, Metal, Vulkan, arch-specific SIMD)

4303f97d

Fix arch-fallback.h: add NVFP4 generic fallback for all platforms

39a3734f

quantize: add NVFP4 as a quantization type option

256d0b1b

Fix ggml_fp32_to_ue4m3: handle subnormal values

ddc93e50

Restore ARM NEON NVFP4 dot product implementation

1d291020

Optimize ARM NEON NVFP4 dot product: LUT + vpaddq + vfmaq

0d015ef5

ARM NEON NVFP4: rearrange q8 to match nibble layout

525c76a4

CPU only backend 64 super-block layout

707e088d

cleanup

4e4275d0

Remove unused LUT

48632879

int

3c3e662a

exclude NVFP4 from unsupported ops in metal build

93bbcadf

remove quantization for now

9ceb0025

store scales as native UE4M3, preserve original model bits when possible

4bfb3188

Update convert_hf_to_gguf.py

3b4ebe5b

correct comment

ad9d68a6

format

0717bfc4

reduce duplication and cleanup

bd4b67ce

Address comments

a3b0c749

move detection to prepare_tensors

396b8241

Use math instead of const

c21e1df8

Move

5cc15166

fix comment

f2b6dce1

Shelf quantize tests

a567d3ed

Rebase and move check

677eedb0

cleanup

7dead730

lint

e400ac78

Update gguf-py/gguf/scripts/gguf_convert_endian.py

52b25d3e

cleanup

7dead730

lint

e400ac78

organize

b005f559

Refactor

db618aa7

richarddd force pushed from 93ab4d7a to db618aa7 24 days ago

ggerganov approved these changes on 2026-03-09

ggerganov requested a review from

CISC 24 days ago

CISC approved these changes on 2026-03-09

Update convert_hf_to_gguf.py

2b465a6f

Update convert_hf_to_gguf.py

27c28316

Update convert_hf_to_gguf.py

d2b9d373

add quantize_nvfp4 (required for test_quants.py)

0870ec53

add quantize_nvfp4 (required for test_quants.py)

238a9125

add quantize_nvfp4 (required for test_quants.py)

215787e0

fix return type

dab2f826

ORippler dismissed these changes on 2026-03-10

Merge branch 'master' into feat/nvfp4

51f757c2

CISC dismissed their stale review 21 days ago

https://github.com/ggml-org/llama.cpp/pull/19769#issuecomment-4040742527

CISC merged 5eae9cb1 into master 21 days ago

richarddd deleted the feat/nvfp4 branch 21 days ago

Reviewers

CISC

ggerganov

ORippler

pwilkin

am17an

mishig25

compilade

0cc4m

JohannesGaessler

Assignees

No one assigned

Labels

model testing Nvidia GPU Vulkan examples python ggml Apple Metal

Milestone

No milestone

llama.cpp ggml : add NVFP4 quantization type support #19769 Merged

ggml : add NVFP4 quantization type support #19769

llama.cpp
ggml : add NVFP4 quantization type support
#19769

Merged