[pyper] add flag to disable clip_ranges_gather fusions (#69198)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69198
add flag --enable_clip_ranges_gather_fusions to disable clip_ranges+gather_ranges fusions.
This fusion happens in static runtime, and it also happens in jit when optimize_sparse_nn_model is used.
Note that clip_ranges+gather_ranges+sigrid_hash fusions use different code that was untouched by D30515441 (https://github.com/pytorch/pytorch/commit/01b30922dd8bb6fcc349316595d14e4e76ea6a9d), so not disabling it for now.
This also effectively disables ClipRangesGatherSigridHash(graph) (even though it's not explicitly included), because that fusion lookgs for the clip_ranges_gather_lengths_to_offsets fusion, which won't exist if this flag is on
Test Plan:
Run ptvsc2_predictor_bench with --enable_clip_ranges_gather_fusions=0 and SR=1
```
Input size: 211
Static runtime ms per iter: 11.9668. Iters per second: 83.5643
Time per node type:
6.42796 ms. 54.5663%. static_runtime::fused_variadic_sigrid_transforms_torch_bind (1 nodes, out variant)
1.64969 ms. 14.0041%. fb::quantized_linear (9 nodes, out variant)
0.475394 ms. 4.03557%. fb::clip_ranges_gather_sigrid_hash_precompute_v3 (158 nodes, out variant)
0.367554 ms. 3.12013%. aten::argmin (1 nodes, out variant)
0.358351 ms. 3.04201%. aten::matmul (1 nodes, out variant)
0.215082 ms. 1.82581%. static_runtime::to_copy (805 nodes, out variant)
0.214397 ms. 1.81999%. fb::gather_ranges (313 nodes, out variant)
0.179759 ms. 1.52595%. fb::offsets_to_ranges (655 nodes, out variant)
0.173236 ms. 1.47058%. fb::lengths_to_offsets (464 nodes, out variant)
0.151249 ms. 1.28394%. aten::sub (1 nodes, out variant)
0.14017 ms. 1.18989%. aten::sigmoid (3 nodes, out variant)
0.136118 ms. 1.15549%. aten::mul (5 nodes, out variant)
0.130813 ms. 1.11046%. aten::sum (3 nodes, out variant)
0.124876 ms. 1.06006%. aten::repeat (1 nodes, out variant)
0.12191 ms. 1.03488%. static_runtime::signed_log1p (1 nodes, out variant)
0.0922349 ms. 0.782972%. aten::norm (1 nodes, out variant)
0.0877845 ms. 0.745193%. aten::pow (1 nodes, out variant)
0.0783335 ms. 0.664966%. fb::batch_box_cox (1 nodes, out variant)
0.0755047 ms. 0.640951%. fb::clip_ranges (311 nodes, out variant)
0.0702456 ms. 0.596308%. static_runtime::layer_norm (2 nodes, out variant)
0.0696762 ms. 0.591475%. fb::quantize_per_tensor (4 nodes)
0.0556873 ms. 0.472724%. quantized::embedding_bag_byte_prepack (3 nodes, out variant)
0.0555237 ms. 0.471335%. prim::VarConcat (2 nodes, out variant)
0.0437336 ms. 0.37125%. static_runtime::dict_unpack (2 nodes, native)
0.0390592 ms. 0.33157%. static_runtime::dequantize_copy (9 nodes, out variant)
0.0385823 ms. 0.327521%. fb::concat_add_mul_replacenan_clip (1 nodes, out variant)
0.0321869 ms. 0.273231%. prim::TupleConstruct (1 nodes, out variant)
0.0308289 ms. 0.261703%. fb::casted_batch_one_hot_lengths (1 nodes, out variant)
0.0280272 ms. 0.23792%. static_runtime::reshape_copy (2 nodes, out variant)
0.0244705 ms. 0.207727%. fb::sigrid_hash_precompute (1 nodes, out variant)
0.020917 ms. 0.177562%. static_runtime::VarTupleUnpack (1 nodes, native)
0.0175842 ms. 0.149271%. aten::div (1 nodes, out variant)
0.0169989 ms. 0.144302%. aten::narrow_copy (4 nodes, out variant)
0.00818147 ms. 0.0694517%. aten::logit (1 nodes, out variant)
0.00719822 ms. 0.061105%. prim::VarStack (1 nodes, out variant)
0.00687292 ms. 0.0583435%. aten::add (1 nodes, out variant)
0.00328646 ms. 0.0278985%. aten::clamp_min (1 nodes, out variant)
0.00325073 ms. 0.0275951%. static_runtime::expand_dims_copy (1 nodes, out variant)
0.00295617 ms. 0.0250946%. static_runtime::flatten_copy (1 nodes, out variant)
0.00230511 ms. 0.0195679%. aten::expand_as (1 nodes, native)
0.00182061 ms. 0.015455%. aten::full_like (1 nodes, out variant)
0.000268152 ms. 0.00227631%. prim::ListConstruct (1 nodes, out variant)
11.7801 ms. in Total
```
Servicelabs:
AF: https://www.internalfb.com/intern/servicelab/1001770528/
AI: https://www.internalfb.com/intern/servicelab/402342245/
Prospector: https://www.internalfb.com/intern/servicelab/502342630/
Reviewed By: movefast1990
Differential Revision: D32750847
fbshipit-source-id: b809a72a9fbeea86080346962eb17761e71397d8