onnxruntime
c50a9af8 - [WebGPU] Implement Split-K on GEMM (#26751)

Commit
59 days ago
[WebGPU] Implement Split-K on GEMM (#26751) ### Description This patch implements the `Split-K` optimization on `GEMM`. 1. Support handling `GEMM` in `MatMulFillBiasOrZeroBeforeSplitKProgram`. We need to add `beta` as a new uniform value and all the parameters that are used to handle all the cases of `GEMM` in `MatMulWriteFnSource()` (including the broadcast of `beta` on both dimensions). 2. Support `Split-K` in `GemmProgram::GenerateShaderCode()`. 3. Add cases to `GemmOptimizePackedTest` to test `Split-K` in `GEMM`. ### Motivation and Context With this PR we can achieve about 20% improvement in `florence-2-base-decoder-with-past-fp16` and 10% improvement in `detr-resnet-50-fp16` on Lunar Lake iGPU.
Author
Committer
sumikuma
Parents
Loading