Extend Inductor to support the third-party backend (#100706)
This PR intends to extend Inductor to support the third-party backend that only focuses on the code generation just like what C++/OpenMP and Triton backend have done.
Currently, the generated code by Inductor contains two major parts. One is the kernel, and the other is the Python wrapper to glue the kernel. Therefore, the third-party backend needs to customize the two parts to generate its specific code.
- Python wrapper code generation
Inductor provides a `WrapperCodeGen` class to generate the Python wrapper code to glue the kernel. Therefore, it is straightforward for the third-party backend to generate the backend-specific Python wrapper code. It just needs to inherit the `WrapperCodeGen` class and purposely override the particular member functions.
- Kernel code generation
It is driven by different `Scheduling`. Hence, the third-party backend needs to provide a custom `Scheduling` for its specific kernel code generation. Currently, `CppScheduling` and `TritonScheduling` are for C++/OpenMP and Triton backend, respectively. But there is no common `Scheduling` class. Based on the scheduling invocation, this PR abstracts a common `Scheduling` class containing the following member functions.
- [group_fn](https://github.com/pytorch/pytorch/blob/71c4becda7cbdea8590cb98da9f2ea4c1edc0cc1/torch/_inductor/scheduler.py#LL649C64-L649C64)
- [flush](https://github.com/pytorch/pytorch/blob/71c4becda7cbdea8590cb98da9f2ea4c1edc0cc1/torch/_inductor/scheduler.py#L1150)
- [can_fuse_vertical](https://github.com/pytorch/pytorch/blob/71c4becda7cbdea8590cb98da9f2ea4c1edc0cc1/torch/_inductor/scheduler.py#L1006)
- [can_fuse_horizontal](https://github.com/pytorch/pytorch/blob/71c4becda7cbdea8590cb98da9f2ea4c1edc0cc1/torch/_inductor/scheduler.py#LL1008C45-L1008C64)
- [codegen_template](https://github.com/pytorch/pytorch/blob/71c4becda7cbdea8590cb98da9f2ea4c1edc0cc1/torch/_inductor/scheduler.py#L1234) _This function is only available for triton. If the third-party backend behaves as a sub-class of `TritonScheduling`, it can override it or reuse it._
- [codegen_nodes](https://github.com/pytorch/pytorch/blob/71c4becda7cbdea8590cb98da9f2ea4c1edc0cc1/torch/_inductor/scheduler.py#L1234)
- [codegen_sync](https://github.com/pytorch/pytorch/blob/71c4becda7cbdea8590cb98da9f2ea4c1edc0cc1/torch/_inductor/scheduler.py#LL1251C1-L1251C1). _This function is only available for triton debug purpose. But it might also be useful for other computation devices. Therefore, we'd prefer to keep this function._
The third-party backend needs to inherit from the `Scheduling` class and implement these functions.
Regarding some other classes like `CppKernel` and `TritonKernel` for code generation, they are used by or part of the logic of either `Scheduling` or `WrapperCodeGen`. Hence, this PR does not define the interface and leaves the flexibility to the third-party backend. The third-party backend can decide to implement these classes from scratch or reuse them by inheriting and overriding them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100706
Approved by: https://github.com/jansel