onnxruntime
d1d596c8 - [MLAS] Add kleidiai pad ptr invalidation test case (#27465)

Commit
78 days ago
[MLAS] Add kleidiai pad ptr invalidation test case (#27465) ### Description This pr introduces some minor code changes which do the following: - Fix copilot header include suggestions from https://github.com/microsoft/onnxruntime/pull/27439 - Add testcase which covers code path fixed via https://github.com/microsoft/onnxruntime/pull/27215 and test case discussed in https://github.com/microsoft/onnxruntime/pull/27214 - Change pointer invalidation to cover only updated pointer in pad structure ### Testing This patch was tested in two ways. 1) After creating tests which I thought would trigger a previous failure case I reverted the convolve_kleidiai.cpp code to before the initial fix in [Hari's change](https://github.com/microsoft/onnxruntime/pull/27215) for pad ptr was introduced. Added debug logging and tested for failures to highlight the moving and invalidation of pointer. Example failure below 2) I reintroduced the current code and then tested multiple times <br> `for i in $(seq 1 2000); do echo "ITER=$i"; ./onnxruntime_mlas_test --long --gtest_filter='*Conv2d*' || break; done` ### Explanation of Subsequent logs <br> 1) **Padding buffer relocation** - `KLEIDIAI_CONV_LHS pad_buf MOVED ci=320 padsize=512 old=0x12e80d800 new=0x12e81ac00` - Meaning: the internal zero padding buffer used for out-of-bounds pixels was resized and the underlying storage address changed (`old` → `new`). Any previously-built indirection table entries that pointed at the old padding buffer are now stale. 2) **Reuse of cached indirection table after the move** - `KLEIDIAI_CONV_LHS indirection_cache HIT ci=64 m=121 **pad=0x12e81ac00 old_pad=0x12e80d800 (after_pad_move)**` - Meaning: for a later convolution (`ci=64`) the indirection-table cache returned a HIT. The log prints the current pad buffer address (`pad=...`) and the most recent prior padding-buffer address (`old_pad=...`) captured during the move. The `(after_pad_move)` tag indicates that this cache HIT occurred after a pad-buffer relocation event, which is the dangerous case in the pre-fix implementation (cached tables may still contain pointers to `old_pad`). In failing runs, the output mismatch occurs immediately after this sequence, showing a clear correlation: **pad buffer moved → cached indirection table reused → incorrect results**. * one note for the test is I commented out most of the rest of the fixture in the changed file before running for time constraints on the 2000 runs ``` jonclo01$ ./onnxruntime_mlas_test --long --gtest_filter='*Conv2d*' clear ------------------------------------------------------- ----Total 3066 tests registered programmably! ------------------------------------------------------- Note: Google Test filter = *Conv2d* [==========] Running 2 tests from 2 test suites. [----------] Global test environment set-up. [----------] 1 test from Conv2d_SingleThread [ RUN ] Conv2d_SingleThread.LongExecute [KLEIDIAI KERNEL]: /Users/jonclo01/kfi-devenv/repos/onnxruntime/onnxruntime/core/mlas/lib/kleidiai/convolve_kleidiai.cpp : 496 : KLEIDIAI_CONV_LHS pad_buf ci=64 padsize=256 addr=0x12e80d800 [KLEIDIAI KERNEL]: /Users/jonclo01/kfi-devenv/repos/onnxruntime/onnxruntime/core/mlas/lib/kleidiai/convolve_kleidiai.cpp : 543 : KLEIDIAI_CONV_LHS indirection_cache MISS ci=64 m=121 pad=0x12e80d800 [KLEIDIAI KERNEL]: /Users/jonclo01/kfi-devenv/repos/onnxruntime/onnxruntime/core/mlas/lib/kleidiai/convolve_kleidiai.cpp : 325 : kai_run_lhs_imatmul_pack_x32p2vlx1_x32p_sme M=121 k_chunk_count=9 k_chunk_length=64 [KLEIDIAI KERNEL]: /Users/jonclo01/kfi-devenv/repos/onnxruntime/onnxruntime/core/mlas/lib/kleidiai/convolve_kleidiai.cpp : 376 : kai_run_rhs_imatmul_pack_kxn_x32p2vlx1b_x32_x32_sme N=32 k_chunk_count=9 k_chunk_length=64 rhs_stride_row=128 [KLEIDIAI KERNEL]: /Users/jonclo01/kfi-devenv/repos/onnxruntime/onnxruntime/core/mlas/lib/kleidiai/convolve_kleidiai.cpp : 653 : kai_run_imatmul_clamp_f32_f32p2vlx1_f32p2vlx1b_2vlx2vl_sme2_mopa M=121 N=32 k_chunk_count=9 k_chunk_length=64 [KLEIDIAI KERNEL]: /Users/jonclo01/kfi-devenv/repos/onnxruntime/onnxruntime/core/mlas/lib/kleidiai/sgemm_kleidiai.cpp : 349 : kai_run_rhs_pack_kxn_f32p2vlx1biasf32_f32_f32_sme Groups=1 N=121 K=576 nr=32 kr=1 sr=1 rhs_stride_row=484 [KLEIDIAI KERNEL]: /Users/jonclo01/kfi-devenv/repos/onnxruntime/onnxruntime/core/mlas/lib/kleidiai/convolve_kleidiai.cpp : 490 : KLEIDIAI_CONV_LHS **pad_buf MOVED ci=320 padsize=512 old=0x12e80d800 new=0x12e81ac00** [KLEIDIAI KERNEL]: /Users/jonclo01/kfi-devenv/repos/onnxruntime/onnxruntime/core/mlas/lib/kleidiai/convolve_kleidiai.cpp : 543 : KLEIDIAI_CONV_LHS indirection_cache MISS ci=320 m=121 pad=0x12e81ac00 [KLEIDIAI KERNEL]: /Users/jonclo01/kfi-devenv/repos/onnxruntime/onnxruntime/core/mlas/lib/kleidiai/convolve_kleidiai.cpp : 325 : kai_run_lhs_imatmul_pack_x32p2vlx1_x32p_sme M=121 k_chunk_count=9 k_chunk_length=320 [KLEIDIAI KERNEL]: /Users/jonclo01/kfi-devenv/repos/onnxruntime/onnxruntime/core/mlas/lib/kleidiai/convolve_kleidiai.cpp : 376 : kai_run_rhs_imatmul_pack_kxn_x32p2vlx1b_x32_x32_sme N=32 k_chunk_count=9 k_chunk_length=320 rhs_stride_row=128 [KLEIDIAI KERNEL]: /Users/jonclo01/kfi-devenv/repos/onnxruntime/onnxruntime/core/mlas/lib/kleidiai/convolve_kleidiai.cpp : 653 : kai_run_imatmul_clamp_f32_f32p2vlx1_f32p2vlx1b_2vlx2vl_sme2_mopa M=121 N=32 k_chunk_count=9 k_chunk_length=320 [KLEIDIAI KERNEL]: /Users/jonclo01/kfi-devenv/repos/onnxruntime/onnxruntime/core/mlas/lib/kleidiai/sgemm_kleidiai.cpp : 349 : kai_run_rhs_pack_kxn_f32p2vlx1biasf32_f32_f32_sme Groups=1 N=121 K=2880 nr=32 kr=1 sr=1 rhs_stride_row=484 [KLEIDIAI KERNEL]: /Users/jonclo01/kfi-devenv/repos/onnxruntime/onnxruntime/core/mlas/lib/kleidiai/convolve_kleidiai.cpp : 535 : KLEIDIAI_CONV_LHS indirection_cache HIT ci=64 m=121 **pad=0x12e81ac00 old_pad=0x12e80d800 (after_pad_move)** [KLEIDIAI KERNEL]: /Users/jonclo01/kfi-devenv/repos/onnxruntime/onnxruntime/core/mlas/lib/kleidiai/convolve_kleidiai.cpp : 325 : kai_run_lhs_imatmul_pack_x32p2vlx1_x32p_sme M=121 k_chunk_count=9 k_chunk_length=64 [KLEIDIAI KERNEL]: /Users/jonclo01/kfi-devenv/repos/onnxruntime/onnxruntime/core/mlas/lib/kleidiai/convolve_kleidiai.cpp : 653 : kai_run_imatmul_clamp_f32_f32p2vlx1_f32p2vlx1b_2vlx2vl_sme2_mopa M=121 N=32 k_chunk_count=9 k_chunk_length=64 [KLEIDIAI KERNEL]: /Users/jonclo01/kfi-devenv/repos/onnxruntime/onnxruntime/core/mlas/lib/kleidiai/sgemm_kleidiai.cpp : 349 : kai_run_rhs_pack_kxn_f32p2vlx1biasf32_f32_f32_sme Groups=1 N=121 K=576 nr=32 kr=1 sr=1 rhs_stride_row=484 /Users/jonclo01/kfi-devenv/repos/onnxruntime/onnxruntime/test/mlas/unittest/test_conv2d.h:249: Failure Expected equality of these values: memcmp(Output, OutputReference, OutputElements * sizeof(float)) Which is: 90 0 B1/G1/Cpg64/Fpg32/H11/W11/KH3/KW3/Pad1,1,1,1/Dilation1,1/Stride1,1 Stack trace: 0x10247ba34: MlasConv2DTest<>::ExecuteLong() 0x102651904: testing::internal::HandleExceptionsInMethodIfSupported<>() 0x1026517a4: testing::Test::Run() 0x102652b5c: testing::TestInfo::Run() 0x102653c84: testing::TestSuite::Run() ... Google Test internal frames ... [ FAILED ] Conv2d_SingleThread.LongExecute, where GetParam() = LongExecute (10 ms) ``` --------- Signed-off-by: Jonathan Clohessy <Jonathan.Clohessy@arm.com>
Author
Parents
Loading