[webgpu] Limit that K must be divisible by 128 to apply dp4a matmul (#24078)
The DP4AMatMulQuantize shader needs to make sure that K is divisible by
128. Otherwise, we need align the scale
to have shape [M, ceil(K / 128)]. To simplify the shader, we limit that
K must be divisible by 128 to apply dp4a matmul.