[webgpu] Manually normalize `Conv-Transpose` dispatch group size for Intel GPUs (#25908)
### Description
This PR improves the performance of `Conv-Transpose` by ~12x on Lunar
Lake.
For a specific shape: (3, 3, 2560, 1280), the default normalization
produced a dispatch group size of (679, 679, 1) resulted in extremely
slow performance on LNL (likely due to a driver issue). By manually
normalize the dispatch group size to (5, 640, 160), we achieve a
significant ~12x performance improvement on LNL.
### Motivation and Context
See above.
Co-authored-by: Yang, Wenqin <wenqin.yang@intel.com>