[webgpu] support arbitrary input_channel size for im2col (#27038)
### Description
<!-- Describe your changes. -->
This PR supports `vec1/vec2` for arbitrary `input_channel` in `im2col`
kernel, which could bring performance gain to more models.
Like for `yolov8n_pose` model, there is about **~7%** gain for whole
model, and **~50%** for those `conv2d` op which `input_size` are not
multiple of 4.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->