onnxruntime
64a719ff - Remove `sequentially_access_by_threads` for Conv (#24938)

Commit

279 days ago

Remove `sequentially_access_by_threads` for Conv (#24938) ### Description  In Transforms.js, the `sequentially_access_by_threads` flag should be set to `true` **only** when the GPU vendor is Intel, as experiments have shown that Intel GPUs perform better with this setting enabled. Currently, ORT sets `sequentially_access_by_threads` to `true` regardless of the GPU vendor. However, based on my local testing, setting `sequentially_access_by_threads` to `false` consistently results in better performance across all platforms. In ONNX Runtime (ORT), this flag is only applied to Conv operators that are not using `vec4` packing (i.e., `MakeMatMulPackedSource`). For GEMM/MatMul operators without `vec4`, the flag remains `false`. Therefore, this change will only affect Conv test cases without `vec4`. This PR leads to performance improvements in certain convolution cases. ### Motivation and Context  I test with local conv model(x[1,256,224,224] weight[63, 256, 3, 3], which don't use vec4), the result is | (ms) | M3Max | NVIDA P620 | NVIDA 5080 | intel | |----------------|-------|------------|------------|-------| | sequentially_access_by_threads == true | 11.2 | 112 | 2.88 | 85.9 | | sequentially_access_by_threads == false | **7** | **66** | **1.90** | **53.4** |

References

#24938 - Remove `sequentially_access_by_threads` for Conv

Author

xiaofeihan1

Parents

c5b48ae3

onnxruntime 64a719ff - Remove `sequentially_access_by_threads` for Conv (#24938)

onnxruntime
64a719ff - Remove `sequentially_access_by_threads` for Conv (#24938)