onnxruntime
4e45b295 - [wasm] Optimize WASM SIMD MlasGemmQuantKernel (#25136)

Commit
282 days ago
[wasm] Optimize WASM SIMD MlasGemmQuantKernel (#25136) ### Description This change optimizes MlasGemmQuantKernel for WASM SIMD build by introducing 4x8 micro kernel. ### Motivation and Context This change optimizes the performance of QGEMM on x64 devices using WASM SIMD build. | Mlas bench/LNL laptop/node v24.2.0 | improvement | |------------------------------------------------------------------------|-------------| | QGEMM/UnsignedANoPackB/M:384/N:1024/K:1024/Batch:1/Threads:4/real_time | 51% | | QGEMM/UnsignedANoPackB/M:384/N:1024/K:3072/Batch:1/Threads:4/real_time | 50% | | QGEMM/UnsignedANoPackB/M:384/N:1024/K:4096/Batch:1/Threads:4/real_time | 51% | | QGEMM/UnsignedANoPackB/M:384/N:4096/K:1024/Batch:1/Threads:4/real_time | 71% |
Author
Parents
Loading