[WebGPU] DequantizeLinear op fixes (#27706)
### Description
<!-- Describe your changes. -->
Fix some issues that show up as test failures in
`js/web/test/data/ops/dequantizelinear.jsonc`.
1. When `component=4`, output shapes where the last dimension was not
divisible by `component` were not handled.
`onnxruntime/core/providers/webgpu/program.cc:247 TensorShape
onnxruntime::webgpu::(anonymous namespace)::GetReducedShape(const
TensorShape &, int) shape.NumDimensions() > 0 &&
shape.GetDims()[shape.NumDimensions() - 1] % component == 0 was false.
Cannot reduce shape {2,2} by component=4`
Added `ProgramOutput::Flatten` to the output definition to address this.
2. Fix handling of zero point in blocked quantization path.
Also renamed some test cases with more descriptive names.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix some issues with WebGPU DequantizeLinear op implementation.