Add faithful C++ API (#44087)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44087
Each op taking a TensorOptions argument now has an additional overload in the C++ frontend where it takes scattered ScalarType, Layout, Device, bool instead of one TensorOptions argument.
If it is a c10-full op, then the scattered version calls into the dispatcher and the gathered version is a proxy calling into the scattered version.
If it is a non-c10-full op, then the gathered version calls into the dispatcher and the scattered version is a proxy calling into the gathered version.
This should minimize the amount of gathering and scattering needed.
This PR is also a prerequisite to remove the re-gathering of arguments that is currently happening in VariableKernel. Currently, VariableKernels gather arguments into a TensorOptions object
to call into the C++ API. In a PR stacked on top of this, VariableKernel will just directly call into the scattered C++ API introduced here and avoid the gathering step.
ghstack-source-id: 113355689
Test Plan:
waitforsandcastle
vs master: https://www.internalfb.com/intern/fblearner/details/216169815/
vs previous diff: https://www.internalfb.com/intern/fblearner/details/216169957/
Reviewed By: ezyang
Differential Revision: D23492188
fbshipit-source-id: 3e84c467545ad9371e98e09075a311bd18411c5a