pytorch
4c3d3b71 - [inductor] Lower small gemvs on CPU (#110456)

Commit
1 year ago
[inductor] Lower small gemvs on CPU (#110456) If the gemv fits in registers, like [1,16]*[16,16], MKL isn't going to do much better than compiling a simple for-loop, and we end up paying allocation overhead and ATen overhead. A very small internal inference model drops from 7->5 us with this change. Differential Revision: [D49875991](https://our.internmc.facebook.com/intern/diff/D49875991/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110456 Approved by: https://github.com/chenyang78, https://github.com/jgong5
Author
Committer
Parents
Loading