llama.cpp
HIP: Adds 4x packed Q8_1 activation for Q4_K_M models in MMVQ
#22821
Closed

HIP: Adds 4x packed Q8_1 activation for Q4_K_M models in MMVQ #22821

jiachengjason wants to merge 10 commits into ggml-org:master from leeliu103:mmvq_q4_x4_pr
jiachengjason
jiachengjason initial commit for q4_k_M x4 repacking
35fb3614
jiachengjason move extra tunning to next pr
811e01aa
jiachengjason remove unnecessary vec_dot_q4_K_q8_1_x4_with_rhs_pair path
d6c75b56
jiachengjason merge vec_dot_q4_K_q8_1_x4_x2 into vec_dot_q4_K_q8_1_x4
7be3065c
jiachengjason merge vec_dot_q4_K_q8_1_x4 into mul_mat_vec_q
b6d3ace3
jiachengjason move vec_dot_q4_K_q8_1_x4 to vecdptq.cuh
d78827f9
jiachengjason remove load_q4_K_block_header and extra comments
b2ff6bc8
jiachengjason inline structs and helper functions
63d52741
jiachengjason jiachengjason marked this pull request as ready for review 37 days ago
jiachengjason jiachengjason requested a review 37 days ago
jiachengjason jiachengjason changed the title Mmvq q4 x4 pr HIP: Adds 4x packed Q8_1 activation (q8_1_x4 MMVQ path) for Q4_K_M models in MMVQ 37 days ago
jiachengjason jiachengjason changed the title HIP: Adds 4x packed Q8_1 activation (q8_1_x4 MMVQ path) for Q4_K_M models in MMVQ HIP: Adds 4x packed Q8_1 activation for Q4_K_M models in MMVQ 37 days ago
JohannesGaessler
JohannesGaessler commented on 2026-05-07
github-actions github-actions added Nvidia GPU
github-actions github-actions added ggml
JohannesGaessler
jiachengjason
jiachengjason generalize kernel for quantizing activation
2df0d335
jiachengjason remove unnecessary block_q8_1_x4
9c477fa7
jiachengjason
ORippler
JohannesGaessler
jiachengjason jiachengjason requested a review from JohannesGaessler JohannesGaessler 27 days ago
jiachengjason
ravel7524
jiachengjason
ravel7524
am17an
JohannesGaessler
jiachengjason
jiachengjason jiachengjason closed this 12 days ago
JohannesGaessler

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone