opencl: add cumsum op (#18981)
* OpenCL: add CUMSUM op support
* remove unused argument
* opencl: refactor cumsum
* opencl: refactor
* opencl: refactor tmp buffer
* opencl: adjust max number of subgroups
* opencl: fix whitespace
* opencl: fix global size when cumsum the tmp buffer
---------
Co-authored-by: Li He <lih@qti.qualcomm.com>