Implement conditional statements in kernel analysis (#119664)
This PR makes it so that ops is no longer a dict of RET => OP but rather it is now RET => List[OP] since now multiple OPs can return the same RET. In real execution, only one of these OPs will be executed, so no need to worry about renaming. For analysis, we pessimistically assume any one of them could be executed (which is safest for analysis purposes)
Example TTIRs that can now be handled:
```
scf.if %13 {
%14 = tt.get_program_id y : i32 loc(#loc13)
%c0_i32_1 = arith.constant 0 : i32 loc(#loc14)
%15 = arith.cmpi eq, %14, %c0_i32_1 : i32 loc(#loc14)
scf.if %15 {
%16 = arith.addf %8, %11 : tensor<4xf32> loc(#loc16)
%17 = tt.splat %arg2 : (!tt.ptr<f32, 1>) -> tensor<4x!tt.ptr<f32, 1>> loc(#loc17)
%18 = tt.addptr %17, %4 : tensor<4x!tt.ptr<f32, 1>>, tensor<4xi32> loc(#loc17)
tt.store %18, %16, %5 {cache = 1 : i32, evict = 1 : i32} : tensor<4xf32> loc(#loc18)
} else {
} loc(#loc15)
} else {
} loc(#loc12)
```
and
```
%14 = scf.if %13 -> (tensor<4xf32>) {
%17 = arith.addf %8, %11 : tensor<4xf32> loc(#loc13)
scf.yield %17 : tensor<4xf32> loc(#loc13)
} else {
%17 = arith.mulf %8, %11 : tensor<4xf32> loc(#loc14)
scf.yield %17 : tensor<4xf32> loc(#loc14)
} loc(#loc12)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119664
Approved by: https://github.com/aakhundov