Adding bunch of unary foreach APIs (#47875)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47875
Implementing several unary operators for _foreach_ APIs.
### Planned list of ops
- [x] abs
- [x] acos
- [x] asin
- [x] atan
- [x] ceil
- [x] cos
- [x] cosh
- [x] erf
- [x] erfc
- [x] exp
- [x] expm1
- [x] floor
- [x] log
- [x] log10
- [x] log1p
- [x] log2
- [ ] frac
- [x] neg
- [ ] reciprocal
- [x] round
- [ ] rsqrt
- [ ] sigmoid
- [x] sin
- [x] sinh
- [x] sqrt
- [x] tan
- [x] tanh
- [ ] trunc
- [x] lgamma
- [ ] digamma
- [ ] erfinv
- [ ] sign
- [ ] mvlgamma
- [ ] clamp
- [ ] clamp_min
- [ ] clamp_max
### Perf results
```
----------------- OP: sin -----------------
Median: 998.79 us
300.84 us
----------------- OP: abs -----------------
Median: 1.19 ms
294.97 us
----------------- OP: acos -----------------
Median: 982.30 us
299.40 us
----------------- OP: asin -----------------
Median: 1.16 ms
298.09 us
----------------- OP: atan -----------------
Median: 986.92 us
295.64 us
----------------- OP: ceil -----------------
Median: 1.17 ms
297.25 us
----------------- OP: cos -----------------
Median: 972.72 us
294.41 us
----------------- OP: cosh -----------------
Median: 1.17 ms
294.97 us
----------------- OP: erf -----------------
Median: 1.17 ms
297.02 us
----------------- OP: erfc -----------------
Median: 1.14 ms
299.23 us
----------------- OP: exp -----------------
Median: 1.15 ms
298.79 us
----------------- OP: expm1 -----------------
Median: 1.17 ms
291.79 us
----------------- OP: floor -----------------
Median: 1.17 ms
293.51 us
----------------- OP: log -----------------
Median: 1.13 ms
318.01 us
----------------- OP: log10 -----------------
Median: 987.17 us
295.57 us
----------------- OP: log1p -----------------
Median: 1.13 ms
297.15 us
----------------- OP: log2 -----------------
Median: 974.21 us
295.01 us
----------------- OP: frac -----------------
Median: 1.15 ms
296.01 us
----------------- OP: neg -----------------
Median: 1.13 ms
294.98 us
----------------- OP: reciprocal -----------------
Median: 1.16 ms
293.69 us
----------------- OP: round -----------------
Median: 1.12 ms
297.48 us
----------------- OP: sigmoid -----------------
Median: 1.13 ms
296.53 us
----------------- OP: sin -----------------
Median: 991.02 us
295.78 us
----------------- OP: sinh -----------------
Median: 1.15 ms
295.70 us
----------------- OP: sqrt -----------------
Median: 1.17 ms
297.75 us
----------------- OP: tan -----------------
978.20 us
297.99 us
----------------- OP: tanh -----------------
Median: 967.84 us
297.29 us
----------------- OP: trunc -----------------
Median: 1.14 ms
298.72 us
----------------- OP: lgamma -----------------
Median: 1.14 ms
317.53 us
```
### Script
```
import torch
import torch.optim as optim
import torch.nn as nn
import torchvision
import torch.utils.benchmark as benchmark_utils
inputs = [torch.rand(3, 200, 200, device="cuda") for _ in range(100)]
def main():
for op in [
"sin", "abs", "acos", "asin", "atan", "ceil",
"cos", "cosh", "erf", "erfc",
"exp", "expm1", "floor", "log",
"log10", "log1p", "log2", "frac",
"neg", "reciprocal", "round",
"sigmoid", "sin", "sinh", "sqrt",
"tan", "tanh", "trunc", "lgamma"
]:
print("\n\n----------------- OP: ", op, " -----------------")
stmt = "[torch.{op}(t) for t in inputs]"
timer = benchmark_utils.Timer(
stmt=stmt.format(op = op),
globals=globals(),
label="str(optimizer)",
)
print(f"autorange:\n{timer.blocked_autorange()}\n\n")
stmt = "torch._foreach_{op}(inputs)"
timer_mta = benchmark_utils.Timer(
stmt=stmt.format(op = op),
globals=globals(),
label="str(optimizer_mta)",
)
print(f"autorange:\n{timer_mta.blocked_autorange()}\n\n")
if __name__ == "__main__":
main()
```
Test Plan: Imported from OSS
Reviewed By: nikithamalgifb
Differential Revision: D24948801
Pulled By: izdeby
fbshipit-source-id: defec3c0394d6816d9a8b05a42a057348f1b4d96