[WebGPU EP] fixes bugs in NCHW version of instance norm operator (#25092)
The instance norm operator suffered from the following issues that this
PR addresses:
1. If {2, 80, 2} is the tensor shape, then there are 320 numbers. {2,
80, 1} is the logical shape where each element is a vec2, so there are
320 numbers as well. The InstanceNorm\<false\> code path was not passing
the logical shape into the shader generation function causing incorrect
output.
2. The output_size was being divided by components which affects how
many workers are dispatched. In the case of components=4, 75% of outputs
for the InstanceNorm\<false\> code path were not updated and remained 0
causing correctness issues.
3. All the tests, including ones explicitly marked NCHW, were being run
on the preferred data layout (NHWC).
4. Typos and some implicit typing was fixed as well.
P.S. Fixes pyannote model