SemanticDiff

pytorch
65e8fe18 - Perf optimization for conv and gemm kernels. (#37626)

Commit View On GitHub

Login via GitHub
Home
Pricing
FAQ
Install

Login via GitHub

Commit

4 years ago

Perf optimization for conv and gemm kernels. (#37626) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37626 Did some rescheduling of the instructions to hide latency of the loads. Particularly at the start of the kernel we have latency bound chains. It seems to improve perf form aarch32. Also did some inst rescheduling for aarch64 gemm kernel. Not clear if this actually helps with perf espcially in OOO CPUs, but worth a try. Test Plan: qnnpack tests q8gemm-test Imported from OSS Differential Revision: D21339037 fbshipit-source-id: 0469581a0e3bd3fd04f15200c2171fc8c264722b

Author

kimishpatel

kimishpatel

Committer

facebook-github-bot

facebook-github-bot

Parents

FAQ Terms Privacy Refunds Impressum

Loading