hexagon: further optimization and tuning of matmul and dot kernels (llama/19407)
* ggml-hexagon: implement 2x2 matmul kernel
* hexmm: implement vec_dot_rx2x2 for Q8_0 and MXFP4
* hexagon: fix editor config failures
* hexagon: refactor matmul ops to use context struct and remove wrappers
Also implement vec_dot_f16 2x2
* hexagon: refactor dyn quantizers to use mmctx
* hexagon: remove mm fastdiv from op_ctx
* hexagon: refactor matmul entry point to reduce code duplication
---------
Co-authored-by: Trivikram Reddy <tamarnat@qti.qualcomm.com>