[Static Runtime] Add sign/abs/lop1p/mul fusion pass (#64209)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64209
Add a new fusion pass that turns transforms the following pattern:
```
graph(%input):
%0 : Tensor = aten::sign(%input)
%1 : Tensor = aten::abs(%input)
%2 : Tensor = aten::log1p(%1)
%res : Tensor = aten::mul(%0, %2)
return (%res)
```
Into a single op:
```
graph(%input):
%res : Tensor = static_runtim::signed_log1p(%input)
return (%res)
```
The intent is to reduce the number of passes over the tensor. However, enabling this pass actually causes a performance regression, probably due to a lack of vectorization in the fused implementation. Because of this issue, this diff **does not** enable this pass.
Followup: navahgar will add an NNC kernel which is faster than the the unfused version and enable this pass. We still need this version as a fallback since the NNC kernel will not support all dtypes.
Test Plan:
`buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- SignedLog1p`
Test passed with new graph pass disabled and enabled.
Reviewed By: hlu1
Differential Revision: D30559929
fbshipit-source-id: e4e080cb2e6a705cfdde1fc98bee92b723f8132a