Ggml/cuda snake fusion hardening (#22912)
* cuda: tighten snake fusion type checks for all operands (defensive, sync vulkan)
* cuda: reject snake fusion when ne[2] or ne[3] > 1 (mirror vulkan PR review)
* cuda: merge type_ok and types_ok into a single types_ok (address am17an review)
* cuda: filter ADD/SUB/MUL/DIV in supports_op to F32/F16
bin_bcast only dispatches F32/F16 type triplets, mirror the
vulkan filter so unsupported types fall back through cpy
instead of aborting.
* test-backend-ops: extend snake_fuse to rank-4 with ne[2]/ne[3] > 1 cases