llama.cpp
ggml : add NVFP4 quantization type support for metal
#20060

Open

ggml : add NVFP4 quantization type support for metal #20060

richarddd wants to merge 44 commits into ggml-org:master from richarddd:feat/nvfp4-metal

WIP: add NVFP4 quantization support

f5a137d7

tests

8b4e790e

improve NVFP4 dot product implementation performance and fix bad sup…

e3e13301

typo

cfe06795

Use nvfp4 kvalues

984aaee7

vulkan : fix NVFP4 shader compilation by including kvalues_mxfp4 look…

cd84cc3d

vulcal and perf fixes

befad80d

wip

cf1d533a

Fix metal

7c730baf

fix vulcan

622a6e8f

Rename threshold & fix wrong scale

3c6f4cae

Fix MOE

06e14c5c

Shelf backend implementations (CUDA, Metal, Vulkan, arch-specific SIMD)

04870346

Fix arch-fallback.h: add NVFP4 generic fallback for all platforms

a8f8fbaa

quantize: add NVFP4 as a quantization type option

fe52c511

Fix ggml_fp32_to_ue4m3: handle subnormal values

4f232bee

Restore ARM NEON NVFP4 dot product implementation

dc5a0228

Optimize ARM NEON NVFP4 dot product: LUT + vpaddq + vfmaq

b99855e6

ARM NEON NVFP4: rearrange q8 to match nibble layout

5951d107

CPU only backend 64 super-block layout

36491e40

cleanup

fa018357

Remove unused LUT

68a6e2d7

int

ee52fdd1

exclude NVFP4 from unsupported ops in metal build

81218b23

remove quantization for now

a27ee0d6

store scales as native UE4M3, preserve original model bits when possible

73bd0f4f

Update convert_hf_to_gguf.py

d6d3368b

correct comment

b0c75e22

format

a26f2fc7

reduce duplication and cleanup

6e434346

Address comments

52b9baa3

move detection to prepare_tensors

0519bfcc

Use math instead of const

2009a9c2

Move

733cac68

fix comment

ff1eec6d

Shelf quantize tests

3f97de2f

Rebase and move check

b5912f25

cleanup

fa669191

lint

9fa4ddc1

Update gguf-py/gguf/scripts/gguf_convert_endian.py

f75235a1

Use fallback quant config

27cf7483

Metal support for NVFP4

0964096c

richarddd requested a review from

ggerganov 6 days ago

richarddd requested a review from

CISC 6 days ago

github-actions added testing

github-actions added python

github-actions added ggml

github-actions added Apple Metal

These should not be shelved

2e96fb42

Format

da39e58e

ggerganov marked this pull request as draft 5 days ago

Reviewers

ggerganov

CISC

Assignees

No one assigned

Labels

testing python ggml Apple Metal

Milestone

No milestone

llama.cpp ggml : add NVFP4 quantization type support for metal #20060 Open

ggml : add NVFP4 quantization type support for metal #20060

llama.cpp
ggml : add NVFP4 quantization type support for metal
#20060

Open