opencl: add optimized q8_0 mm kernel for adreno (#18871)
* Add Q8_0 OpenCL kernel
Co-authored-by: yunjie <yunjie@qti.qualcomm.com>
* opencl: fix build for non-adreno
* opencl: refactor q8_0
* opencl: enforce subgroup size of 64 for adreno for q8_0
* For A750 and older generations, subgroup size can be 64 or 128.
This kernel assumes subgroup size 64.
* opencl: suppress warning when adreno kernels are disabled
---------
Co-authored-by: yunjie <yunjie@qti.qualcomm.com>
Co-authored-by: Li He <lih@qti.qualcomm.com>