Fix WebGPU ConvTranspose shader bugs for non-vectorizable input channels (#27749)
## Description
Fix two bugs in the WebGPU ConvTranspose shader code generation
(conv_backprop.cc) in the `pack_input_as4_` code path when
`a_components_ == 1` (triggered when input channels per group is not
divisible by 2 or 4, e.g., 5 or 7).
### Bug 1: Wrong offset for weight reads
Weight values were read using `x_offset` (the input/dy tensor offset)
instead of `w_offset` (the weight tensor offset), producing incorrect
convolution results.
### Bug 2: Missing weight multiplication in remainder loop
The remainder loop (handling leftover channels when
`inputChannelsPerGroup` is not a multiple of 4) was adding raw input
values to `dotProd` without multiplying by the corresponding weight
values.
## Motivation and Context
The `inChannels = 5` and `inChannels = 7` test cases in
`conv-transpose.jsonc` were failing because these channel counts aren't
divisible by 2 or 4, triggering the buggy `a_components_ == 1` branch.
Cases like `inChannels = 6` (`a_components_ = 2`) and `inChannels = 8`
(`a_components_ = 4`) were unaffected.
## Testing
All 22 conv-transpose WebGPU tests now pass:
```
npm test -- op conv-transpose.jsonc -b=webgpu -e=node
22 passing (23s)
```
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>