[TensorExpr] Eager reduction initialization & removal from ReduceOp (#38585)
Summary:
This PR removes the deferred initializer field from ReduceOp in favour of eagerly initializing buffers when they are created (either in the constructor of `LoopNest`, or in `rfactor()`). This allows a pretty good simplification of reduction logic, removing almost all of the reduction expander and the ReduceInitCleaner & unpopular NoOp node added in the last fix.
Eager initialization is better for us anyway because it allows more opportunities to transform the initialization loop.
Added a few more tests, testReduceOverSplitWithTail failed before this change due to a bug in splitWithTail which now can't happen.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38585
Differential Revision: D21621551
Pulled By: nickgg
fbshipit-source-id: 378137e5723b4a6d6e390239efb12adce22a8215