Convert floating-point constants to T in Bessel functions (#59416)
Summary:
If T is float, many of the computations are more expensive than
expected. Compilers may be reluctant to optimize because they often lead
to different outcome. Converting many constants to T before using them
to clear any doubt.
Benchmark: (Debian 11, no turbo, Release build, Intel(R) Xeon(R) E-2136 CPU @ 3.30GHz, gcc 10.2.1)
```python
import timeit
for dtype in ('torch.float',):
for func in ('i0', 'i0e', 'i1', 'i1e'):
for n, t in [(10_000, 10000),
(100_000, 1000)]:
print(f'torch.special.{func}(torch.arange(n, dtype=torch.float32)), n = {n} for {t} times, dtype={dtype}')
print(timeit.timeit(f'torch.special.{func}(a)', setup=f'import torch; a = torch.arange({n}, dtype=torch.float32)', number=t))
```
Before:
```
torch.special.i0(torch.arange(n, dtype=torch.float32)), n = 10000 for 10000 times, dtype=torch.float
1.539132010017056
torch.special.i0(torch.arange(n, dtype=torch.float32)), n = 100000 for 1000 times, dtype=torch.float
0.9613071230123751
torch.special.i0e(torch.arange(n, dtype=torch.float32)), n = 10000 for 10000 times, dtype=torch.float
4.32450835997588
torch.special.i0e(torch.arange(n, dtype=torch.float32)), n = 100000 for 1000 times, dtype=torch.float
1.5751779029960744
torch.special.i1(torch.arange(n, dtype=torch.float32)), n = 10000 for 10000 times, dtype=torch.float
1.0810036820184905
torch.special.i1(torch.arange(n, dtype=torch.float32)), n = 100000 for 1000 times, dtype=torch.float
0.5314770240220241
torch.special.i1e(torch.arange(n, dtype=torch.float32)), n = 10000 for 10000 times, dtype=torch.float
0.41711462699458934
torch.special.i1e(torch.arange(n, dtype=torch.float32)), n = 100000 for 1000 times, dtype=torch.float
0.1759720179834403
```
After:
```
torch.special.i0(torch.arange(n, dtype=torch.float32)), n = 10000 for 10000 times, dtype=torch.float
1.337154256994836
torch.special.i0(torch.arange(n, dtype=torch.float32)), n = 100000 for 1000 times, dtype=torch.float
0.8640981369826477
torch.special.i0e(torch.arange(n, dtype=torch.float32)), n = 10000 for 10000 times, dtype=torch.float
4.308618158014724
torch.special.i0e(torch.arange(n, dtype=torch.float32)), n = 100000 for 1000 times, dtype=torch.float
1.5217605629877653
torch.special.i1(torch.arange(n, dtype=torch.float32)), n = 10000 for 10000 times, dtype=torch.float
0.9398589830088895
torch.special.i1(torch.arange(n, dtype=torch.float32)), n = 100000 for 1000 times, dtype=torch.float
0.4667845010117162
torch.special.i1e(torch.arange(n, dtype=torch.float32)), n = 10000 for 10000 times, dtype=torch.float
0.3658539849857334
torch.special.i1e(torch.arange(n, dtype=torch.float32)), n = 100000 for 1000 times, dtype=torch.float
0.15680673700990155
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59416
Reviewed By: anjali411
Differential Revision: D29249897
Pulled By: mruberry
fbshipit-source-id: c170e78f2ab47176ea95b8442c6279d7ec1d75c2