[webgpu] fix 2 bugs in Conv/ConvTranspose (#24388)
### Description
- fix a bug in ConvTranspose
This bug causes `input_channels_per_group_int` to be `-3` for a test
case, and later causes a loop of `4294967293` times (`uint32_t(-3)`)
that causing timeout.
- fix cache hint of Conv2dMMProgram
After fixing the bug in ConvTranspose, more cache hint inconsistencies
are revealed. This change fixes channel_last missing in the cache hint
of Conv2dMMProgram.