remove reordering_reindex (#127367)
Summary:
This fixes the loop ordering issue for avg_pool2d here (https://github.com/pytorch/pytorch/issues/126255#issuecomment-2117931529).
The reason we can not fuse the 2 kernels for avg_pool2d is due to ComputedBuffer.iter_reordering_reindex. Take a simpler example:
```
def f(x, y):
"""
Add a matmul since inductor may force layout for output.
"""
return (x.sum(dim=-1) + 1) @ y
# Make the first 2 dimension not able to merge on purpose so that
# ComputedBuffer.iter_reoredering_reindex will be updated.
x = rand_strided([20, 20, 30], [30, 900, 1], device="cuda")
y = torch.randn(20, 20)
```
Suppose x.sum is stored to x2. The computed buffer for x2 will remember that we have reordered it's first and second dimension (i.e. loop order [1, 0]). Later one when we decide the loop order for x2 when computing 'x2 + 1' , we decide to pick loop order [1, 0] according to the stride analysis. And then we use the saved ComputedBuffer.iter_reordering_reindex to further reorder the loop order. The net effect is that we use loop order [0, 1] which cause the pointwise kernel not able to fuse with the reduction kernel.
I feel that we don't need ComputedBuffer.iter_reordering_reindex. And test result shows removing it has neutral impact on the dashboard [link](https://hud.pytorch.org/benchmark/compilers?startTime=Wed%2C%2022%20May%202024%2017%3A30%3A29%20GMT&stopTime=Wed%2C%2029%20May%202024%2017%3A30%3A29%20GMT&granularity=hour&suite=torchbench&mode=training&dtype=amp&lBranch=gh/shunting314/153/head&lCommit=195f42cf1a414d2d1a0422b8a081a85ff52b7d20&rBranch=main&rCommit=d6e3e89804c4063827ea21ffcd3d865e5fe365d9)
X-link: https://github.com/pytorch/pytorch/pull/127367
Approved by: https://github.com/jansel
Reviewed By: izaitsevfb
Differential Revision: D58014745
Pulled By: shunting314
fbshipit-source-id: 91f14ea313aeb4f297e259b7423107e34dafe512