llama.cpp
8141e730 - ggml-cpu: support K tails in power10 Q8/Q4 MMA matmul (#24753)

Commit

1 day ago

ggml-cpu: support K tails in power10 Q8/Q4 MMA matmul (#24753) * ggml-cpu: support K tails in Power10 MMA Q8/Q4 matmul This patch removes the requirement that K be divisible by kc in the tinyBlas_Q0_PPC tiled matmul path. Process the final K panel using its actual depth and pass the reduced panel size through packing and kernel execution. This allows more workloads to use the MMA kernel and reduces fallback to mnpack. * Apply suggestion from @taronaeo Co-authored-by: Aaron Teo <taronaeo@gmail.com> --------- Co-authored-by: Aaron Teo <taronaeo@gmail.com>

References

#24753 - ggml-cpu: support K tails in power10 Q8/Q4 MMA matmul

Author

shalinib-ibm

Parents

db52540f

llama.cpp 8141e730 - ggml-cpu: support K tails in power10 Q8/Q4 MMA matmul (#24753)

llama.cpp
8141e730 - ggml-cpu: support K tails in power10 Q8/Q4 MMA matmul (#24753)