[TensorExpr] Fix two bugs in Rfactor (#39268)
Summary:
The two bugs were:
* Non-reduction axes were not added when inserting the new ReduceOp, meaning if a reduction with non-reduce axes was rfactored we'd produce bad outputs. There were no tests of Rfactor with non-reduce axis so I modified a test to do this.
* The new statements were always prepended to the block, meaning writes to a buffer could be reordered after the usage of that buffer. This mostly happened in the case where we rfactor a previously rfactored reduction. There was a test of this, but since it only tested rfactoring the outer reduction axis there was never any other statements at the insertion point (the tests of the insertion point argument also do this). I added a new test which covers various rfactor-axis cases.
Also cleaned up tests, removed some helper code we don't need etc.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39268
Differential Revision: D21864489
Pulled By: nickgg
fbshipit-source-id: d314d20997a8472ec96b72f7a9068d6da6d2399c