[MicroBench] Added a log_vml version of the signed log1p kernel (#64205)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64205
The log_vml version of the micro-bench is over **2x** faster than the log1p version. Here are the perf numbers:
```
---------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
---------------------------------------------------------------------------------------------
SignedLog1pBench/ATen/10/1467 45915 ns 45908 ns 14506 GB/s=2.5564G/s
SignedLog1pBench/NNC/10/1467 40469 ns 40466 ns 17367 GB/s=2.9002G/s
SignedLog1pBench/NNCLogVml/10/1467 19560 ns 19559 ns 35902 GB/s=6.00016G/s
```
Thanks to bertmaher for pointing this out.
Test Plan: Imported from OSS
Reviewed By: bertmaher
Differential Revision: D30644716
Pulled By: navahgar
fbshipit-source-id: ba2b32c79d4265cd48a2886b0c62d0e89ff69c19