ggml: fix cuda kernel launch configuration for k_compute_batched_ptrs to support large batch (#16744)
* fix k_compute_batched_ptrs
* add backend ops test
* Update ggml/src/ggml-cuda/ggml-cuda.cu
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* reduce the batch size
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>