[webgpu] Optimize generic 4D Transpose using OIHW2OHWI Program (#26942)
### Description
This PR migrates the `OIHW2OHWI` Program from `Im2ColMatMul` to the
`Transpose` operator. By centralizing this logic, we leverage the
specialized shader to optimize generic 4D transpositions (specifically
the {0, 2, 3, 1} permutation pattern) while reducing code duplication.
While this shader is capable of supporting 2D/3D transpositions, those
optimizations are reserved for follow-up PRs.
### Motivation and Context
See above.