opencl: add kernel to handle mat mul in attention to improve encoding speed (#17181)
* Add mul_mm_f16_f32_kq_kqv kernel
* Add ggml_cl_mul_mat_kq_kqv_adreno func
* fix whitespace
* remove unused variable
* remove redundant
* refactor and clean up
* remove trailing whitespace