Added statistic related to out variant nodes
Summary: added more statistic info for static runtime
Test Plan:
caffe2/benchmarks/static_runtime:static_runtime_cpptest
Expected output example:
Static runtime ms per iter: 0.939483. Iters per second: 1064.41
Node #0: 0.195671 ms/iter, %wide_offset.1 : Tensor = aten::add(%wide.1, %self._mu, %4)
Node #1: 0.169457 ms/iter, %wide_normalized.1 : Tensor = aten::mul(%wide_offset.1, %self._sigma)
Node #2: 0.118218 ms/iter, %wide_preproc.1 : Tensor = aten::clamp(%wide_normalized.1, %5, %6)
Node #3: 0.038814 ms/iter, %user_emb_t.1 : Tensor = aten::transpose(%user_emb.1, %4, %7)
Node #4: 0.0860747 ms/iter, %dp_unflatten.1 : Tensor = aten::bmm(%ad_emb_packed.1, %user_emb_t.1)
Node #5: 0.0102666 ms/iter, %31 : Tensor = static_runtime::flatten_copy(%dp_unflatten.1, %4, %8)
Node #6: 0.000476333 ms/iter, %19 : Tensor[] = prim::ListConstruct(%31, %wide_preproc.1)
Node #7: 0.0707332 ms/iter, %input.1 : Tensor = aten::cat(%19, %4)
Node #8: 0.123695 ms/iter, %fc1.1 : Tensor = aten::addmm(%self._fc_b, %input.1, %29, %4, %4)
Node #9: 0.0309244 ms/iter, %23 : Tensor = aten::sigmoid(%fc1.1)
Node #10: 0.0046297 ms/iter, %24 : (Tensor) = prim::TupleConstruct(%23)
Time per node type:
0.195671 ms. 23.0483%. aten::add (1 nodes)
0.169457 ms. 19.9605%. aten::mul (1 nodes, out variant)
0.123695 ms. 14.5702%. aten::addmm (1 nodes, out variant)
0.118218 ms. 13.925%. aten::clamp (1 nodes, out variant)
0.0860747 ms. 10.1388%. aten::bmm (1 nodes, out variant)
0.0707332 ms. 8.33175%. aten::cat (1 nodes, out variant)
0.038814 ms. 4.57195%. aten::transpose (1 nodes)
0.0309244 ms. 3.64263%. aten::sigmoid (1 nodes, out variant)
0.0102666 ms. 1.20932%. static_runtime::flatten_copy (1 nodes, out variant)
0.0046297 ms. 0.545338%. prim::TupleConstruct (1 nodes, out variant)
0.000476333 ms. 0.0561079%. prim::ListConstruct (1 nodes, out variant)
0.848959 ms. in Total
StaticRuntime setup time: 0.018925 ms
Memory allocation time: 0.019808 ms
Memory deallocation time: 0.0120445 ms
Outputs deallocation time: 0.0864947 ms
Total memory managed: 19328 bytes
Total number of reused tensors: 3
Total number of 'out' variant nodes/total number of nodes: 9/11 (81.8182%)
Reviewed By: hlu1
Differential Revision: D28553029
fbshipit-source-id: 55e7eab50b4b475ae219896100bdf4f6678875a4