[TensorExpr] Fix a bug in Rfactor when there are multiple reductions (#38733)
Summary:
In `LoopNest::rfactor` we assume that there is only a single reduction below the insertion point, and when replacing the reduction we recursively replace all reductions below that point. This is not a safe assumption, as a number of transformations can introduce additional ReduceOps - most directly a `splitWithTail` on the innermost reduce axis.
This PR fixes that bug, and adds some unit tests covering the case.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38733
Differential Revision: D21723634
Pulled By: nickgg
fbshipit-source-id: 3ed6ffcdc2c15aef7504f9b2b91e8d827e0b5d88