ggml-cpu: arm64: q4_K repack gemm and gemv implementations (i8mm) (#16739)
* Enabled q4_K_8x8_q8_K path on ARM
* wip: I8mm qs multiplication, pending bias
* cpu : arm : REPACK gemm q4_K8x8 implementation
Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai>
* Guard gemm with proper features, improved superblock scale and min calc
Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai>
* cpu: arm: Implemented REPACK gemv for Q4_K
Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai>
* Removed completed TODO
* Fixed missing guards when selecting optimal repack type for Q4_K
Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai>
* Fixed macro guard for gemv
* Fixed wrong comment in GEMV
* Fixed warning for unused variable
* vdotq_s32 -> ggml_vdotq_s32
Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai>
* Clang-format issues
* Apply suggestions from code review
Co-authored-by: Diego Devesa <slarengh@gmail.com>
* Removed unnecessary GGML_UNUSED
* Fixed guards in q4_k gemm and gemv (repack)
---------
Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai>
Co-authored-by: Diego Devesa <slarengh@gmail.com>