llama.cpp
ggml-cpu: Optimized x86 and generic cpu q1_0 dot (follow up)
#21636

Merged

ggml-cpu: Optimized x86 and generic cpu q1_0 dot (follow up) #21636

ggerganov merged 7 commits into ggml-org:master from pl752:perf/q1_0_g128_no_nofma

Implemented optimized q1_0 dot for x86 and generic

195593bc

Removed redundant helper definition

e29cd486

pl752 marked this pull request as ready for review 58 days ago

pl752 requested a review from

ggerganov 58 days ago

Removed two redundant instructions from AVX q1_0 dot

8587b5cc

github-actions added ggml

Fixed inconsistency with fp16 conversion for generic q1_0 dot and ded…

0c4fb41f

Style cleanup around AVX q1_0 dot

7f82cf0c

pl752 changed the title ~~(Performance; ggml-cpu) Optimized x86 and generic cpu q1_0 dot (follow up)~~ ggml-cpu: Optimized x86 and generic cpu q1_0 dot (follow up) 55 days ago

am17an approved these changes on 2026-04-13

Replaced explicitly unrolled blocks with inner for loop for q1_0

67f8d32d

Replaced scalar ARM q1_0 impl with new generic one

715f62ac

am17an approved these changes on 2026-04-15

ggerganov approved these changes on 2026-04-20

ggerganov merged 7f251fdb into master 46 days ago

Reviewers

ggerganov

am17an

Assignees

No one assigned

Labels

ggml

Milestone

No milestone

llama.cpp ggml-cpu: Optimized x86 and generic cpu q1_0 dot (follow up) #21636 Merged

ggml-cpu: Optimized x86 and generic cpu q1_0 dot (follow up) #21636

llama.cpp
ggml-cpu: Optimized x86 and generic cpu q1_0 dot (follow up)
#21636

Merged