WebGPU: Support Split-K with batch size > 1 (#28151)
### Description
This patch adds the support of Split-K with batch size > 1 by
encoding both batch index and Split-K index in dispatch_z and
decompose them in the shader via:
batch = logical_global_id.z / num_k_splits
split_index = logical_global_id.z % num_k_splits
This patch also adds batch size to the criteria of using Split-K
as increasing batch size will also increasing the parallelism,
reducing the effectiveness of Split-K.
This patch also replaces `consteval` with `constexpr` in
`ort_version_check.h` to workaround a compilation error
about vs2022.
### Motivation and Context
With this patch we can improve the performance of
`sam-vit-b-decoder-static-fp16-demo` (7.5%) on Intel PTL.