llvm-project
3092b765 - [mlir][vector] Refactor WarpOpScfForOp to support unused or swapped forOp results. (#147620)

Commit
177 days ago
[mlir][vector] Refactor WarpOpScfForOp to support unused or swapped forOp results. (#147620) Current implementation generates incorrect code or crashes in the following valid cases. 1. At least one of the for op results are not yielded by the warpOp. Example: ``` %0 = gpu.warp_execute_on_lane_0(%arg0)[32] -> (vector<4xf32>) { .... %3:2 = scf.for %arg3 = %c0 to %c128 step %c1 iter_args(%arg4 = %ini, %arg5 = %ini1) -> (vector<128xf32>, vector<128xf32>) { %1 = ... %acc = .... scf.yield %acc, %1 : vector<128xf32>, vector<128xf32> } gpu.yield %3#0 : vector<128xf32> // %3#1 is not used but can not be removed as dead code (loop carried). } "some_use"(%0) : (vector<4xf32>) -> () return ``` 2. Enclosing warpOp yields the forOp results in different order compared to the forOp results. Example: ``` %0:3 = gpu.warp_execute_on_lane_0(%arg0)[32] -> (vector<4xf32>, vector<4xf32>, vector<8xf32>) { .... %3:3 = scf.for %arg3 = %c0 to %c128 step %c1 iter_args(%arg4 = %ini1, %arg5 = %ini2, %arg6 = %ini3) -> (vector<256xf32>, vector<128xf32>, vector<128xf32>) { ..... scf.yield %acc1, %acc2, %acc3 : vector<256xf32>, vector<128xf32>, vector<128xf32> } gpu.yield %3#2, %3#1, %3#0 : vector<128xf32>, vector<128xf32>, vector<256xf32> // swapped order } "some_use_1"(%0#0) : (vector<4xf32>) -> () "some_use_2"(%0#1) : (vector<4xf32>) -> () "some_use_3"(%0#2) : (vector<8xf32>) -> () ```
Author
Parents
Loading