[c2][opt] nomnigraph transform for ClipRangesGatherSigridHash fusion (#37535)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37535
Fuse ClipRanges + GatherRanges + SigridHash -> ClipRangesGatherSigridHash
dpa_product_ctr model's dper2 to dper3 migration is blocked by 3.6% higher prospector cpu usage. Root cause is traced down to sigrid transforms, where ClipRanges, GatherRanges, SigridHash are separately called, instead of fused, as is the case in dper2.
Further context:
https://fb.quip.com/GijaAZtX5mav
https://fb.quip.com/pIDdAjJP2uiG
Test Plan:
Local benchmarking with small model 181513584_0
(Dper3 full model is 178772812, dper2 refresh is 178770392)
Transform turned on: P129799373
Iters per second: 609.291
Transform turned off: P129799397
Iters per second: 519.088
We also want to confirm this performance on the full model in canary and in qrt.
`buck build mode/opt-clang mode/no-gpu caffe2/caffe2/fb/predictor:ptvsc2_predictor_bench`
`MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --pred_net=/data/users/ansha/tmp/dpa/small_pred_net.pb --c2_model=/data/users/ansha/tmp/dpa/181513584_0.predictor --c2_inputs=/data/users/ansha/tmp/dpa/c2_inputs_small.pb --iters=3000 --warmup_iters=100 --num_threads=32 --c2_apply_nomnigraph_passes=1 --caffe2_predictor_enable_preproc_fusion=1`
Prospector canary:
https://our.intern.facebook.com/intern/ads/canary/426280288521552095/
Check that ClipRangesGatherSigridHash is used: https://fburl.com/scuba/caffe2_operator_stats_canary/e6qfdsat
Reviewed By: yinghai
Differential Revision: D21262085
fbshipit-source-id: 2c2481e3d4977abb8abe6e9ef0c9999382320ab2