Out variant for embedding_bag_4bit_rowwise_offsets (#51324)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51324
Add out variant for embedding_bag_4bit_rowwise_offsets and add it to static runtime registry
Test Plan:
```
MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 numactl -m 0 -C 1 buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --scripted_model=$INLINE_CVR_DIR/210494966_0.predictor.disagg.remote_request_only_remote_cast.pt --pt_inputs=$INLINE_CVR_DIR/remote_ro_wrapped_input_data.pt --pt_enable_static_runtime=true --pt_cleanup_activations=true --pt_enable_out_variant=true --compare_results=true --iters=5000 --warmup_iters=5000 --num_threads=1 --do_profile=true
```
before:
```
0.789023 ms. 54.8408%. quantized::embedding_bag_4bit_rowwise_offsets (82 nodes)
```
after:
```
0.620817 ms. 49.7136%. quantized::embedding_bag_4bit_rowwise_offsets (82 nodes)
```
Reviewed By: ajyu
Differential Revision: D26138322
fbshipit-source-id: 44d3f15d04636404ebd4c1e9eecf73c7ad972944
Author
Marat Subkhankulov