[Static Runtime] Support native op split_with_sizes (#69999)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69999
This adds support for the split_with_sizes operator in static runtime by adding native operators. Those operators will have less overhead comparing to their JIT fallbacks (no dispatching, no stack constructing in runtime).
split_with_sizes can be called directly from cpp API, or in `torch.split` when `split_sizes` is a list. This diff adds support for both use cases.
Test Plan:
- Added unit tests. Made sure the operators are used
- Benchmark
```
./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench \
--scripted_model=/data/users/dxd/305797439_0.predictor.precompute.remote_request_only \
--method_name=user.forward --pt_cleanup_activations=1 \
--pt_enable_out_variant=1 --pt_optimize_memory=1 --iters=1000 --warmup_iters=500 \
--num_threads=1 --pt_enable_static_runtime=1 --set_compatibility=1 \
--input_type="recordio" --pt_inputs=/data/users/dxd/305797439_0_user.inputs.recordio \
--recordio_use_ivalue_format=1 --do_profile=1 --do_benchmark=1
```
#### Before
```
Static runtime ms per iter: 3.62073. Iters per second: 276.187
0.0471904 ms. 1.31501%. aten::split_with_sizes (5 nodes)
```
#### After
```
Static runtime ms per iter: 3.44374. Iters per second: 290.382
0.0432057 ms. 1.34276%. aten::split_with_sizes (5 nodes, native)
```
Reviewed By: swolchok
Differential Revision: D33141006
fbshipit-source-id: feae34c4c873fc22d48a8ff3bf4d71c0e00bb365