onnxruntime
4236d6e9 - Fail loudly when MatMulNBits receives unsupported block_size on CPU EP (#28590)

Commit

55 days ago

Fail loudly when MatMulNBits receives unsupported block_size on CPU EP (#28590) ### Description `MatMulNBits` with `block_size > 256` (e.g. 512) silently produces all-zero output on CPU EP. The MLAS dequantization fallback path has a `default: break;` that skips dequantization entirely, leaving a zero-initialized buffer that flows into GEMM. Changes: - Add `ORT_ENFORCE` in `MatMulNBits` constructor rejecting block sizes other than {16, 32, 64, 128, 256} — surfaces error at session initialization - Replace silent `default: break;` with `ORT_ENFORCE` in `MlasDequantizeBlockwise`, `MlasQuantizeBlockwise`, and `MlasBlockwiseQuantizedBufferSizes` as defense-in-depth - Add regression test `MatMulNBits.UnsupportedBlockSize_512` ### Motivation and Context Fix for https://github.com/microsoft/onnxruntime/issues/28551. Users passing `block_size=512` (valid per the op spec, which only requires power-of-2 ≥ 16) get silently wrong results with no error or warning. This affects real models like Tencent's Hy-MT1.5-1.8B-2bit GGUF which uses per-512-element scales. The fix converts silent miscomputation into an immediate, actionable error message. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>

References

#28590 - Fail loudly when MatMulNBits receives unsupported block_size on CPU EP

Author

Copilot

Parents

5996a1ec

onnxruntime 4236d6e9 - Fail loudly when MatMulNBits receives unsupported block_size on CPU EP (#28590)

onnxruntime
4236d6e9 - Fail loudly when MatMulNBits receives unsupported block_size on CPU EP (#28590)