pytorch
e0be76fb - [static_runtime] fix num args for to_copy (#56441)

Commit
3 years ago
[static_runtime] fix num args for to_copy (#56441) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56441 Since aten::to is overloaded, match schema to replace it with static_runtime::to_copy Test Plan: ``` MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 numactl -m 0 -C 3 ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --c2_model=/data/users/ansha/tmp/adfinder/210494966_0.predictor.disagg.remote_request_only --c2_inputs=/data/users/ansha/tmp/adfinder/models/c2_remote_ro_input_data.pb --pred_net=/data/users/ansha/tmp/adfinder/models/c2_remote_ro_net2.pb --c2_sigrid_transforms_opt=1 --c2_apply_nomnigraph_passes=1 --c2_use_memonger=1 --scripted_model=/data/users/ansha/tmp/adfinder/models_dianshi/210494966_0.predictor.disagg.remote_request_only.pt --pt_inputs=/data/users/ansha/tmp/adfinder/models/remote_ro_wrapped_input_data.pt --pt_enable_static_runtime=1 --pt_cleanup_activations=1 --pt_enable_out_variant=1 --compare_results=1 --iters=1 --warmup_iters=1 --num_threads=1 --do_profile=1 --benchmark_c2_predictor=0 --do_benchmark=0 ``` ``` Time per node type: 0.623426 ms. 55.337%. quantized::embedding_bag_4bit_rowwise_offsets (82 nodes) 0.331633 ms. 29.4367%. quantized::embedding_bag_byte_rowwise_offsets (71 nodes) 0.123163 ms. 10.9323%. aten::to (155 nodes) 0.038479 ms. 3.4155%. fb::lengths_to_offsets (155 nodes) 0.004169 ms. 0.370052%. aten::embedding_bag (2 nodes) 0.002549 ms. 0.226256%. static_runtime::to_copy (2 nodes) 0.002512 ms. 0.222972%. prim::TupleConstruct (1 nodes) 0.000667 ms. 0.0592048%. prim::dtype (2 nodes) 1.1266 ms. in Total StaticRuntime setup time: 0.009605 ms Memory allocation time: 0.001907 ms Memory deallocation time: 0.032401 ms Outputs deallocation time: 0.020876 ms Total memory managed: 256 bytes Total number of reused tensors: 159 ``` I verified that all of the aten::to matches, for the local, local_ro, and remote_ro nets in opt and dev mode. Only 2 of calls are replaced because the other 155 have either the input or the ouput of the op returned as an external output. This is a similar case for the other instances of aten::to in the local and local_ro nets. Reviewed By: hlu1 Differential Revision: D27872350 fbshipit-source-id: b72785ea2768be415faae2afcf9915aef07daec2
Author
Parents
Loading