HIP: Adds 4x packed Q8_1 activation for Q4_K_M models in MMVQ #22821
initial commit for q4_k_M x4 repacking
35fb3614
move extra tunning to next pr
811e01aa
remove unnecessary vec_dot_q4_K_q8_1_x4_with_rhs_pair path
d6c75b56
merge vec_dot_q4_K_q8_1_x4_x2 into vec_dot_q4_K_q8_1_x4
7be3065c
merge vec_dot_q4_K_q8_1_x4 into mul_mat_vec_q
b6d3ace3
move vec_dot_q4_K_q8_1_x4 to vecdptq.cuh
d78827f9
remove load_q4_K_block_header and extra comments
b2ff6bc8
inline structs and helper functions
63d52741
jiachengjason
marked this pull request as ready for review 37 days ago
jiachengjason
changed the title Mmvq q4 x4 pr HIP: Adds 4x packed Q8_1 activation (q8_1_x4 MMVQ path) for Q4_K_M models in MMVQ 37 days ago
jiachengjason
changed the title HIP: Adds 4x packed Q8_1 activation (q8_1_x4 MMVQ path) for Q4_K_M models in MMVQ HIP: Adds 4x packed Q8_1 activation for Q4_K_M models in MMVQ 37 days ago
generalize kernel for quantizing activation
2df0d335
remove unnecessary block_q8_1_x4
9c477fa7
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub