pytorch
4c3d3b71 - [inductor] Lower small gemvs on CPU (#110456)

Commit

1 year ago

[inductor] Lower small gemvs on CPU (#110456) If the gemv fits in registers, like [1,16]*[16,16], MKL isn't going to do much better than compiling a simple for-loop, and we end up paying allocation overhead and ATen overhead. A very small internal inference model drops from 7->5 us with this change. Differential Revision: [D49875991](https://our.internmc.facebook.com/intern/diff/D49875991/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110456 Approved by: https://github.com/chenyang78, https://github.com/jgong5

Author

bertmaher

Committer

pytorchmergebot

Parents

30c4c6ff

pytorch 4c3d3b71 - [inductor] Lower small gemvs on CPU (#110456)

pytorch
4c3d3b71 - [inductor] Lower small gemvs on CPU (#110456)