fix: use W_zero_point_is_uniform for PerColumnZeroPoints and rename depthwise fallback test
PerColumnZeroPoints so that uniform per-channel ZPs use the faster scalar
MLAS path.
- Rename Conv2D_S8S8_Depthwise_PerChannelZeroPoints to
Conv2D_S8S8_DepthwiseFallback_PerChannelZeroPoints and add a comment
clarifying it validates the group-GEMM fallback path.