[inductor] decompose tanh in CPP backend (#91687)
## Description
The decomposition of `tanh` has been removed in https://github.com/pytorch/pytorch/pull/90889.
```python
@register_decomposition([aten.tanh])
def tanh(x):
return 2.0 / (1.0 + torch.exp(-2.0 * x)) - 1.0
```
We've observed performance regression on CPU for `lennard_jones` in the TorchBench suite.
This PR decomposes `tanh` in CPP backend to fix the regression.
### Performance
- Model: lennard_jones
- Machine: IceLake (32 cores per socket)
- Configuration: single instance, 32 cores per instance
- jemalloc and iomp enabled
```bash
python benchmarks/dynamo/torchbench.py --inductor-settings --inductor --performance --float32 -dcpu -n500 --no-skip --dashboard --only=lennard_jones --quiet
```
<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta name=ProgId content=Excel.Sheet>
<meta name=Generator content="Microsoft Excel 15">
<link id=Main-File rel=Main-File
href="file:///C:/Users/chunyuan/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
<link rel=File-List
href="file:///C:/Users/chunyuan/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
</head>
<body link="#0563C1" vlink="#954F72">
Time before regression | Time after regression | Time with this PR
-- | -- | --
0.000262036 | 0.0003618 | 0.000267888
</body>
</html>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91687
Approved by: https://github.com/jgong5, https://github.com/desertfire