Extend DQ→MatMulNBits fusion to support Gemm + per-tensor/per-channel quantization (#27769)
Extends the QDQ `DQMatMulToMatMulNBits` fusion to handle additional
quantization patterns beyond the existing blockwise DQ→MatMul case.
### New support
- **Gemm**: Fuses DQ→Gemm (with optional bias, including DQ bias) into
MatMulNBits, stripping Gemm-specific attributes (`alpha`, `beta`,
`transB`).
- **Per-tensor & per-channel quantization**: Expands scalar/1D scales
and zero-points into block-quantized format expected by MatMulNBits.
Block size is configurable via `session.qdq_matmulnbits_block_size`
(default: 32).
### Changes
- **Selectors** (qdq_selectors.cc): Replaced
`ValidateBlockwiseDQForMatMulNBits` with `ValidateDQForMatMulNBits`
supporting all three quantization modes. Added Gemm-specific validation.
- **Actions** (qdq_actions.cc): Added scale/zp expansion for
non-blockwise cases, Gemm attribute cleanup, and bias wiring to
MatMulNBits input 5.
- **Registration** (qdq_selector_action_transformer.cc): Registered
`Gemm` alongside `MatMul`; threaded `qdq_matmulnbits_block_size` from
session config.
- **Tests** (qdq_matmulnbits_transformer_test.cc): Added tests for
per-tensor, per-channel, Gemm (no bias, constant bias, DQ bias), block
size options, and negative cases.