llama.cpp
ggml-hexagon: Implement true Q8_0 quantization on Hexagon NPU for more accurate mixed-precision matmul operations
#17977

Merged

ggml-hexagon: Implement true Q8_0 quantization on Hexagon NPU for more accurate mixed-precision matmul operations #17977

max-krasnyansky merged 4 commits into ggml-org:master from ngdxzy:real_q8_0

feat: implement real Q8_0

9148eaaa

ngdxzy requested a review from

max-krasnyansky 16 days ago

ngdxzy requested a review from

lhez 16 days ago

ngdxzy changed the title ~~ggml-hexagon: Implement true Q8_0 quantization on Hexagon NPU (add q8x1 / q8x2 paths)~~ ggml-hexagon: Implement true Q8_0 quantization on Hexagon NPU (add q8x1 / q8x2 paths) for more accurate mixed-precision matmul operations 16 days ago

ngdxzy changed the title ~~ggml-hexagon: Implement true Q8_0 quantization on Hexagon NPU (add q8x1 / q8x2 paths) for more accurate mixed-precision matmul operations~~ ggml-hexagon: Implement true Q8_0 quantization on Hexagon NPU for more accurate mixed-precision matmul operations 16 days ago

github-actions added ggml

chraac commented on 2025-12-13

Merge branch 'master' of github.com:ngdxzy/llama.cpp into real_q8_0

44a309aa

feat: adding cmake option for configuring FP32 quantize group size

172c3fc2

max-krasnyansky approved these changes on 2025-12-18

github-actions added documentation

typo: set() shall be used

60d04d68

max-krasnyansky merged ce734a8a into master 9 days ago

Reviewers

max-krasnyansky

twflm

chraac

lhez

Assignees

No one assigned

Labels

documentation ggml

Milestone

No milestone

llama.cpp ggml-hexagon: Implement true Q8_0 quantization on Hexagon NPU for more accurate mixed-precision matmul operations #17977 Merged

ggml-hexagon: Implement true Q8_0 quantization on Hexagon NPU for more accurate mixed-precision matmul operations #17977

llama.cpp
ggml-hexagon: Implement true Q8_0 quantization on Hexagon NPU for more accurate mixed-precision matmul operations
#17977

Merged