[TensorExpr] fix a bug in ReorderAxis when there are trailing loops (#38841)
Summary:
Fixes a bug in reorder axis where we append the new reordered loops to the enclosing block, even if there were statements after it. e.g. with 3 Computes:
```
for (int m1 ...
for (int n1 ...
for (int k1 ...
Body 1
for (int m2 ...
for (int n2 ...
for (int k2 ...
Body 2
for (int m3 ...
for (int n3 ...
for (int k3 ...
Body 3
```
If we reorder loops m2 and k2, we were also reordering the body statements like this:
```
for (int m1 ...
for (int n1 ...
for (int k1 ...
Body 1
for (int m3 ...
for (int n3 ...
for (int k3 ...
Body 3
for (int k2 ...
for (int n2 ...
for (int m2 ...
Body 2
```
This is because we always append the new loops to their parent. This PR fixes the logic to replace the old loop root with the new loop, which keeps things consistent.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38841
Differential Revision: D21723670
Pulled By: nickgg
fbshipit-source-id: 1dee8bb153182fcaa2cabd948197577e8e80acd7