pytorch
31584d06 - [Static Runtime] Added NNC implementation for signed log1p kernel. (#65387)

Commit
4 years ago
[Static Runtime] Added NNC implementation for signed log1p kernel. (#65387) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65387 Added a customized NNC implementation for signed log1p kernel and enabled the fusion pass that adds the fused signed log1p op. Also, added a SR microbenchmark for this kernel which shows the performance improvement. Without fusion: ``` -------------------------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------------------------- BM_signed_log1p/16 1953 ns 1953 ns 358746 BM_signed_log1p/64 2049 ns 2049 ns 342145 BM_signed_log1p/512 3291 ns 3291 ns 214342 BM_signed_log1p/4096 15559 ns 15559 ns 44420 BM_signed_log1p/32768 101936 ns 101935 ns 6843 BM_signed_log1p/65536 194792 ns 194789 ns 3615 ``` With NNC fusion: ``` -------------------------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------------------------- BM_signed_log1p/16 369 ns 369 ns 1896179 BM_signed_log1p/64 497 ns 497 ns 1406995 BM_signed_log1p/512 1618 ns 1618 ns 430209 BM_signed_log1p/4096 11327 ns 11326 ns 61463 BM_signed_log1p/32768 84099 ns 84086 ns 8325 BM_signed_log1p/65536 166531 ns 166510 ns 4186 ``` This clearly shows >15% improvement in performance of this kernel with NNC fusion. On inline_cvr local model, there is a small improvement in terms of profiled time spent on ops: without fusion: `0.9%` (computed by adding the % spent on all the 4 ops involved) with NNC fusion: `0.55%` Test Plan: `buck test mode/opt-clang //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- SignedLog1p` Also, did the accuracy test with inline_cvr as described here, https://fb.quip.com/qmdDAJzEmPtf, on the full size model (285298536_1) ``` get 57220 prediction values get 57220 prediction values max_error: 0 total: 0 ``` Reviewed By: hlu1 Differential Revision: D30609492 fbshipit-source-id: d2e68df580569a30ee61abb0ef18d2c4c56827bd
Author
Parents
Loading