[TensorExpr] Enable inlining for output tensors too. (#48967)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48967
We previously didn't inline output tensors which resulted in correctness
issues like #48533. This PR allows inlining for output tensors too -
this could result in duplicated computations, but we can address that
later once correctness is ensured.
Performance results on FastRNNS:
Before the fix:
```
Benchmarking LSTMs...
name avg_fwd std_fwd avg_bwd std_bwd
cudnn 10.09 0.05431 17.55 0.2108
aten 21.52 0.1276 26.7 1.471
jit 13.25 0.8748 22.47 1.73
jit_premul 11.43 0.3226 19.43 2.231
jit_premul_bias 11.84 0.2245 20.33 2.205
jit_simple 13.27 0.9906 22.15 0.9724
jit_multilayer 13.38 0.8748 22.82 1.01
py 33.55 4.837 46.41 6.333
```
After the fix:
```
Benchmarking LSTMs...
name avg_fwd std_fwd avg_bwd std_bwd
cudnn 10.09 0.05979 17.45 0.1987
aten 21.21 0.144 26.43 0.7356
jit 13.01 0.2925 23.21 0.8454
jit_premul 11.4 0.3905 19.62 2.448
jit_premul_bias 11.85 0.2461 20.29 0.6592
jit_simple 13.08 0.8533 22.81 1.315
jit_multilayer 12.93 0.1095 23.57 1.459
py 31.21 2.783 44.63 6.073
```
Differential Revision: D25383949
Test Plan: Imported from OSS
Reviewed By: SplitInfinity
Pulled By: ZolotukhinM
fbshipit-source-id: 16f5727475109a278499bef7905f6aad18c8527a
Author
Mikhail Zolotukhin