Use contiguous() to handle noncontiguous outputs during elementwise decomposition (#108140)
Fixes https://github.com/pytorch/pytorch/issues/108218
Use contiguous() API to handle noncontiguous outputs during elementwise decomp
With this change, ops is decomposing properly (testcase from the bug):
```
graph():
%arg0_1 : [#users=3] = placeholder[target=arg0_1]
%abs_1 : [#users=1] = call_function[target=torch.ops.aten.abs.default](args = (%arg0_1,), kwargs = {})
%floor : [#users=1] = call_function[target=torch.ops.aten.floor.default](args = (%abs_1,), kwargs = {})
%sign : [#users=1] = call_function[target=torch.ops.aten.sign.default](args = (%arg0_1,), kwargs = {})
%mul : [#users=1] = call_function[target=torch.ops.aten.mul.Tensor](args = (%floor, %sign), kwargs = {})
%sub : [#users=1] = call_function[target=torch.ops.aten.sub.Tensor](args = (%arg0_1, %mul), kwargs = {})
return (sub,)
```
Output:
```
tensor([[ 0.2871, 0.7189, 0.7297],
[ 0.8782, -0.4899, 0.7055]], device='hpu:0')
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108140
Approved by: https://github.com/ezyang