pytorch
fd0bf96c - [inductor] make multi-kernel work with cpp-wrapper (#117813)

Commit
280 days ago
[inductor] make multi-kernel work with cpp-wrapper (#117813) Make multi-kernel work with cpp-wrapper. multi-kernel generates two equivalent variants for a reduction. At runtime the faster one is picked. But cpp-wrapper need save cubin file during codegen. They don't work with each other at the beginning. Thanks Jason for suggesting a neat way to integrate these two. cpp-wrapper does 2 passes codegen right now. For the first pass, we still generate multi-kernel code and run it; for the second pass, we load the cubin file for the faster kernel directly. And multi-kernel python code is not generated for the second pass since they should not be needed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/117813 Approved by: https://github.com/jansel
Author
Committer
Parents
Loading