[js/webgpu] Allow binary ops with scalar to use the vectorize path (#17589)
### Description
1. For binary ops, the components is always 4. So the dispatchGroup
should be : `{x: Math.ceil(outputSize / 64 /* workgroup size */ / 4 /*
component size */)}` instead of `{x: Math.ceil(outputSize / 64 /*
workgroup size */ / (vectorize ? 4 : 1) /* vec size */)}`.
2. If any of a or b only has one element, we still can use the vectorize
path since the same value will be broadcasted.