llama.cpp
HIP: Adds 4x packed Q8_1 activation for Q4_K_M models in MMVQ
#22821

Closed

HIP: Adds 4x packed Q8_1 activation for Q4_K_M models in MMVQ #22821

jiachengjason wants to merge 10 commits into ggml-org:master from leeliu103:mmvq_q4_x4_pr

initial commit for q4_k_M x4 repacking

35fb3614

move extra tunning to next pr

811e01aa

remove unnecessary vec_dot_q4_K_q8_1_x4_with_rhs_pair path

d6c75b56

merge vec_dot_q4_K_q8_1_x4_x2 into vec_dot_q4_K_q8_1_x4

7be3065c

merge vec_dot_q4_K_q8_1_x4 into mul_mat_vec_q

b6d3ace3

move vec_dot_q4_K_q8_1_x4 to vecdptq.cuh

d78827f9

remove load_q4_K_block_header and extra comments

b2ff6bc8

inline structs and helper functions

63d52741

jiachengjason marked this pull request as ready for review 37 days ago

jiachengjason requested a review 37 days ago

jiachengjason changed the title ~~Mmvq q4 x4 pr~~ HIP: Adds 4x packed Q8_1 activation (q8_1_x4 MMVQ path) for Q4_K_M models in MMVQ 37 days ago

jiachengjason changed the title ~~HIP: Adds 4x packed Q8_1 activation (q8_1_x4 MMVQ path) for Q4_K_M models in MMVQ~~ HIP: Adds 4x packed Q8_1 activation for Q4_K_M models in MMVQ 37 days ago

JohannesGaessler commented on 2026-05-07

github-actions added Nvidia GPU

github-actions added ggml

generalize kernel for quantizing activation

2df0d335

remove unnecessary block_q8_1_x4

9c477fa7

jiachengjason requested a review from

JohannesGaessler 27 days ago

jiachengjason closed this 12 days ago

Reviewers

JohannesGaessler

Assignees

No one assigned

Labels

Nvidia GPU ggml

Milestone

No milestone

llama.cpp HIP: Adds 4x packed Q8_1 activation for Q4_K_M models in MMVQ #22821 Closed

HIP: Adds 4x packed Q8_1 activation for Q4_K_M models in MMVQ #22821

llama.cpp
HIP: Adds 4x packed Q8_1 activation for Q4_K_M models in MMVQ
#22821

Closed