opencl: add optimized q4_1 mm kernel for adreno (#19840)
* Add Q4_1 OpenCL Kernels
* opencl: refactor transpose
* opencl: format
* opencl: refactor q4_1 unpack
* opencl: move `ggml_cl_mul_mat_q4_1_f32_adreno`
* opencl: refactor `ggml_cl_mul_mat_q4_1_f32_adreno` and kernels
* opencl: rename kernel files and kernes
* opencl: fix build for non adreno
* opencl: move code around and format
---------
Co-authored-by: Li He <lih@qti.qualcomm.com>