benchmark
c3d510d7 - remove reordering_reindex (#127367)

Commit
1 year ago
remove reordering_reindex (#127367) Summary: This fixes the loop ordering issue for avg_pool2d here (https://github.com/pytorch/pytorch/issues/126255#issuecomment-2117931529). The reason we can not fuse the 2 kernels for avg_pool2d is due to ComputedBuffer.iter_reordering_reindex. Take a simpler example: ``` def f(x, y): """ Add a matmul since inductor may force layout for output. """ return (x.sum(dim=-1) + 1) @ y # Make the first 2 dimension not able to merge on purpose so that # ComputedBuffer.iter_reoredering_reindex will be updated. x = rand_strided([20, 20, 30], [30, 900, 1], device="cuda") y = torch.randn(20, 20) ``` Suppose x.sum is stored to x2. The computed buffer for x2 will remember that we have reordered it's first and second dimension (i.e. loop order [1, 0]). Later one when we decide the loop order for x2 when computing 'x2 + 1' , we decide to pick loop order [1, 0] according to the stride analysis. And then we use the saved ComputedBuffer.iter_reordering_reindex to further reorder the loop order. The net effect is that we use loop order [0, 1] which cause the pointwise kernel not able to fuse with the reduction kernel. I feel that we don't need ComputedBuffer.iter_reordering_reindex. And test result shows removing it has neutral impact on the dashboard [link](https://hud.pytorch.org/benchmark/compilers?startTime=Wed%2C%2022%20May%202024%2017%3A30%3A29%20GMT&stopTime=Wed%2C%2029%20May%202024%2017%3A30%3A29%20GMT&granularity=hour&suite=torchbench&mode=training&dtype=amp&lBranch=gh/shunting314/153/head&lCommit=195f42cf1a414d2d1a0422b8a081a85ff52b7d20&rBranch=main&rCommit=d6e3e89804c4063827ea21ffcd3d865e5fe365d9) X-link: https://github.com/pytorch/pytorch/pull/127367 Approved by: https://github.com/jansel Reviewed By: izaitsevfb Differential Revision: D58014745 Pulled By: shunting314 fbshipit-source-id: 91f14ea313aeb4f297e259b7423107e34dafe512
Author
Parents
Loading