onnxruntime
574806bf - Fixes for DynamicQuantizeMatMul and Attention3D tests (#25814)

Commit

142 days ago

Fixes for DynamicQuantizeMatMul and Attention3D tests (#25814) ### Description This change fixes correctness issues in two areas that were causing failures in onnxruntime_test_all: - DynamicQuantizeMatMul.WithConstantBInputs - AttentionTest.Attention3DDefault - AttentionTest.Attention3DWithPastAndPresentQkMatmul What was wrong and how it’s fixed 1) DynamicQuantizeMatMul.WithConstantBInputs - Root cause: The Kleidi dynamic quantization GEMM path could be selected even when the B scales contained values such as (zero, negative, or non-finite). That violates kernel assumptions and can lead to incorrect results. - Fix: In `onnxruntime/contrib_ops/cpu/quantization/dynamic_quantize_matmul.cc`, we now explicitly validate that all B scales are finite and strictly positive before enabling the Kleidi/MLAS dynamic path. If any scale is invalid, we disable that path. 2) Attention tests (Attention3DDefault, Attention3DWithPastAndPresentQkMatmul) - Root causes in `onnxruntime/core/mlas/lib/kleidiai/sgemm_kleidiai.cpp`: - Incorrect handling of GEMM corner cases for alpha/beta and K==0 (e.g., not respecting C = beta*C when alpha==0 or K==0). - Unnecessary or premature fallbacks for small shapes. - Fixes: - Add early-outs for degenerate sizes: if M==0 or N==0, return handled. - Correctly implement alpha/beta semantics: --------- Signed-off-by: Jonathan Clohessy <jonathan.clohessy@arm.com>

References

#25814 - Fixes for DynamicQuantizeMatMul and Attention3D tests

Author

JonathanC-ARM

Parents

e525ea22

onnxruntime 574806bf - Fixes for DynamicQuantizeMatMul and Attention3D tests (#25814)

onnxruntime
574806bf - Fixes for DynamicQuantizeMatMul and Attention3D tests (#25814)