[NNC] Dont inline outputs buffers on cpu (#49488)
Summary:
In https://github.com/pytorch/pytorch/pull/48967/ we enabled output buffer inlining, which results in duplicate computation if one output depends on another. This was done to fix correctness for CUDA, but is not needed for correctness for CPU and results in perf slowdown.
The output buffer inlining solution for CUDA is intended to be an interim solution because it does not work with reductions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49488
Reviewed By: ezyang
Differential Revision: D25596071
Pulled By: eellison
fbshipit-source-id: bc3d987645da5ce3c603b4abac3586b169656cfd