[inductor] Lower torch.exp2 and use it for torch.pow(2, x) (#92632)
Before
```python
tmp0 = 2.0
tmp2 = tl.libdevice.pow(tmp0, tmp1)
```
After
```python
tmp1 = tl.libdevice.exp2(tmp0)
```
I've benchmarked on CPU and CUDA with the following examples
```
@torch._dynamo.optimize()
def exp2(x):
return torch.pow(2, x)
@torch._dynamo.optimize()
def logaddexp2(a, b):
m = torch.maximum(a, b)
return m + torch.log2(1 + torch.pow(2, -torch.abs(a-b)))
```
triton is able to specialize `pow(2, x)` such that this makes
no difference, but on CPU I see a surprisingly large speedup.
| device | Function | Master (us) | This PR (us) | Speedup |
|--------|-----------|-------------|--------------|---------|
| CUDA | exp2 | 64 | 63 | 1.0 |
| | logaddexp | 109 | 107 | 1.0 |
| CPU | exp2 | 220 | 40 | 5.5 |
| | logaddexp | 282 | 140 | 2.0 |
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92632
Approved by: https://github.com/lezcano, https://github.com/ngimel