llama.cpp
ggml : add NVFP4 quantization type support for metal
#20060
Open
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
44
Changes
View On
GitHub
ggml : add NVFP4 quantization type support for metal
#20060
richarddd
wants to merge 44 commits into
ggml-org:master
from
richarddd:feat/nvfp4-metal
WIP: add NVFP4 quantization support
f5a137d7
tests
8b4e790e
improve NVFP4 dot product implementation performance and fix bad sup…
e3e13301
typo
cfe06795
Use nvfp4 kvalues
984aaee7
vulkan : fix NVFP4 shader compilation by including kvalues_mxfp4 look…
cd84cc3d
vulcal and perf fixes
befad80d
wip
cf1d533a
Fix metal
7c730baf
fix vulcan
622a6e8f
Rename threshold & fix wrong scale
3c6f4cae
Fix MOE
06e14c5c
Shelf backend implementations (CUDA, Metal, Vulkan, arch-specific SIMD)
04870346
Fix arch-fallback.h: add NVFP4 generic fallback for all platforms
a8f8fbaa
quantize: add NVFP4 as a quantization type option
fe52c511
Fix ggml_fp32_to_ue4m3: handle subnormal values
4f232bee
Restore ARM NEON NVFP4 dot product implementation
dc5a0228
Optimize ARM NEON NVFP4 dot product: LUT + vpaddq + vfmaq
b99855e6
ARM NEON NVFP4: rearrange q8 to match nibble layout
5951d107
CPU only backend 64 super-block layout
36491e40
cleanup
fa018357
Remove unused LUT
68a6e2d7
int
ee52fdd1
exclude NVFP4 from unsupported ops in metal build
81218b23
remove quantization for now
a27ee0d6
store scales as native UE4M3, preserve original model bits when possible
73bd0f4f
Update convert_hf_to_gguf.py
d6d3368b
correct comment
b0c75e22
format
a26f2fc7
reduce duplication and cleanup
6e434346
Address comments
52b9baa3
move detection to prepare_tensors
0519bfcc
Use math instead of const
2009a9c2
Move
733cac68
fix comment
ff1eec6d
Shelf quantize tests
3f97de2f
Rebase and move check
b5912f25
cleanup
fa669191
lint
9fa4ddc1
Update gguf-py/gguf/scripts/gguf_convert_endian.py
f75235a1
Use fallback quant config
27cf7483
Metal support for NVFP4
0964096c
richarddd
requested a review
from
ggerganov
6 days ago
richarddd
requested a review
from
CISC
6 days ago
github-actions
added
testing
github-actions
added
python
github-actions
added
ggml
github-actions
added
Apple Metal
These should not be shelved
2e96fb42
Format
da39e58e
ggerganov
marked this pull request as draft
5 days ago
Login to write a write a comment.
Login via GitHub
Reviewers
ggerganov
CISC
Assignees
No one assigned
Labels
testing
python
ggml
Apple Metal
Milestone
No milestone
Login to write a write a comment.
Login via GitHub