pytorch
fd0bf96c - [inductor] make multi-kernel work with cpp-wrapper (#117813)

Commit

280 days ago

[inductor] make multi-kernel work with cpp-wrapper (#117813) Make multi-kernel work with cpp-wrapper. multi-kernel generates two equivalent variants for a reduction. At runtime the faster one is picked. But cpp-wrapper need save cubin file during codegen. They don't work with each other at the beginning. Thanks Jason for suggesting a neat way to integrate these two. cpp-wrapper does 2 passes codegen right now. For the first pass, we still generate multi-kernel code and run it; for the second pass, we load the cubin file for the faster kernel directly. And multi-kernel python code is not generated for the second pass since they should not be needed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/117813 Approved by: https://github.com/jansel

Author

shunting314

Committer

pytorchmergebot

Parents

04d52d53

pytorch fd0bf96c - [inductor] make multi-kernel work with cpp-wrapper (#117813)

pytorch
fd0bf96c - [inductor] make multi-kernel work with cpp-wrapper (#117813)