[NNC] Build aggregate stmt for kernel before LoopNest. (#53024)
Summary:
This PR builds an aggregate stmt for all the tensors in the kernel before constructing LoopNest. This migrates to using the LoopNest constructor that takes in a stmt and output buffers. This is one more step closer to eliminating the dependency of LoopNest on Tensor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53024
Reviewed By: H-Huang
Differential Revision: D26729221
Pulled By: navahgar
fbshipit-source-id: 43e972585351f6902c14b383b137aaaee3aaa3e1