Migrate log_sigmoid (forward and backward) to ATen (CUDA) (#60881)
Summary:
Fixes gh-24591, fixes gh-24590, closes gh-39642
Benchmarks were run with nvprof using contiguous inputs; they show improvement across the board.
#### Forward benchmarks
| Num Elements | Master (us) | This PR (us) |
|:------------:|:-----------:|:------------:|
| 10^4 | 2.5840 | 2.5230 |
| 10^5 | 4.6410 | 3.9280 |
| 10^6 | 33.772 | 23.025 |
| 10^7 | 299.67 | 206.35 |
| 10^8 | 3001.9 | 2052.8 |
#### Backward benchmarks
| Num Elements | Master (us) | This PR (us) |
|:------------:|:-----------:|:------------:|
| 10^4 | 2.7750 | 2.7080 |
| 10^5 | 5.2430 | 3.9010 |
| 10^6 | 46.198 | 32.878 |
| 10^7 | 447.18 | 296.18 |
| 10^8 | 4393.2 | 2938.0 |
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60881
Reviewed By: mruberry
Differential Revision: D29589455
Pulled By: ngimel
fbshipit-source-id: 70cd5db244bf6292e9ca367462640530a1d85f7d