[pyper] casted_batch_one_hot_lengths with 4-arg to (#53215)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53215
The current 5-arg version doesn't fuse the inline_cvr model instances
Test Plan:
```
MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 numactl -m 0 -C 3 ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --c2_weights=/data/users/ansha/tmp/adfinder/models/c2_local_weight_data.pb --c2_inputs=/data/users/ansha/tmp/adfinder/models/c2_local_input_data.pb --pred_net=/data/users/ansha/tmp/adfinder/models/c2_local_net.pb --c2_sigrid_transforms_opt=1 --c2_apply_nomnigraph_passes=1 --c2_use_memonger=1 --scripted_model=/data/users/ansha/tmp/adfinder/models_dianshi/210494966_0.predictor.disagg.local.pt --pt_inputs=/data/users/ansha/tmp/adfinder/models/local_wrapped_input_data.pt --pt_enable_static_runtime=1 --pt_cleanup_activations=true --pt_enable_out_variant=1 --compare_results=1 --iters=2000 --warmup_iters=2000 --num_threads=1 --do_profile=1 --do_benchmark --benchmark_c2_predictor=1
```
```
Time per node type:
3.82029 ms. 71.8523%. aten::addmm (9 nodes)
0.926298 ms. 17.4219%. fb::sigrid_transforms (1 nodes)
0.122496 ms. 2.30391%. fb::clip_ranges_gather (210 nodes)
0.11985 ms. 2.25416%. fb::clip_ranges_gather_sigrid_hash_precompute_v3 (54 nodes)
0.0973721 ms. 1.83138%. aten::sigmoid (3 nodes)
0.0352937 ms. 0.663807%. fb::batch_box_cox (1 nodes)
0.034759 ms. 0.65375%. prim::TupleConstruct (1 nodes)
0.0222235 ms. 0.417981%. aten::index (4 nodes)
0.0215314 ms. 0.404964%. fb::casted_batch_one_hot_lengths (1 nodes)
0.0199659 ms. 0.375521%. fb::concat_add_mul_replacenan_clip (1 nodes)
0.0192885 ms. 0.362779%. aten::cat (2 nodes)
0.0181285 ms. 0.340963%. aten::mul (2 nodes)
0.0109381 ms. 0.205725%. aten::pow (1 nodes)
0.0091476 ms. 0.172049%. prim::ListConstruct (8 nodes)
0.00794012 ms. 0.149338%. aten::relu (2 nodes)
0.00668873 ms. 0.125802%. prim::ListUnpack (1 nodes)
0.00569745 ms. 0.107158%. aten::to (4 nodes)
0.00527507 ms. 0.099214%. aten::narrow_copy (4 nodes)
0.00483189 ms. 0.0908785%. fb::lengths_range (4 nodes)
0.00399056 ms. 0.0750548%. aten::logit (1 nodes)
0.00324574 ms. 0.0610462%. fb::gather_ranges (4 nodes)
0.00161166 ms. 0.0303122%. fb::clip_ranges (2 nodes)
5.31686 ms. in Total
StaticRuntime setup time: 0.016461 ms
Memory allocation time: 0.00220284 ms
Memory deallocation time: 0.118134 ms
Outputs deallocation time: 0.0674883 ms
Total memory managed: 716352 bytes
Total number of reused tensors: 22
```
Reviewed By: hlu1
Differential Revision: D26789260
fbshipit-source-id: 52adadddaae29a946de8a58bd592c06e6d4ce8c8