[autocast] Make it easier to register rules (#86402)
On the way to resolving https://github.com/pytorch/pytorch/issues/86294
Previously, there were three macros used to register autocast rules:
- KERNEL
- KERNEL_DIFFERENT_REDISPATCH_SIGNATURE
- KERNEL_CPU
This PR makes the KERNEL and KERNEL_CPU macros less redundant for users.
KERNEL_DIFFERENT_REDISPATCH_SIGNATURE is weird and only used three
times, so I didn't change them.
Concretely, KERNEL(OP, OP_NAME, SIGNATURE, POLICY) is redundant:
- op/op_name are similar, and the signature can be decltype'd.
PR changes it so that instead, one uses either:
- KERNEL(OP, POLICY)
- KERNEL2(OP, OVERLOAD, POLICY)
depending on whether the operator name has an overload.
This PR also gives the same treatment to the KERNEL_CPU macro, which is
used for registering autocast cpu rules: it splits KERNEL_CPU into
KERNEL_CPU(OP, POLICY) AND KERNEL_CPU2(OP, OVERLOAD, POLICY).
I will do some more cleanup of things that are implemented via
`m.impl(...)` in a follow-up PR so that I don't get confused when I need
to rebase.
Test Plan:
- wait for tests (how good are our autocast tests?)
- code reading
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86402
Approved by: https://github.com/ezyang