onnxruntime
992c5981 - Internal Dupe of #25255 - [MLAS] Optimize MlasConv using thread partition opt (#26103)

Commit

102 days ago

Internal Dupe of #25255 - [MLAS] Optimize MlasConv using thread partition opt (#26103) ### Description This is an internal branch dupe of https://github.com/microsoft/onnxruntime/pull/25255 + some minor cosmetic changes to account for Copilot feedback ### Motivation and Context Improve performance of NCHW Conv - Both grouped convolutions and batched inputs should benefit from this change. For a detailed understanding of perf improvement, please refer to the numbers in https://github.com/microsoft/onnxruntime/pull/25255. Credit to @zoeczy and team for this improvement and code change --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>

References

#26103 - Internal Dupe of #25255 - [MLAS] Optimize MlasConv using thread partition opt

Author

hariharans29

Parents

9e79b367

onnxruntime 992c5981 - Internal Dupe of #25255 - [MLAS] Optimize MlasConv using thread partition opt (#26103)

onnxruntime
992c5981 - Internal Dupe of #25255 - [MLAS] Optimize MlasConv using thread partition opt (#26103)