[static runtime] fix out variant for 4bit embedding bag (#55096)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55096
There were issues with D26138322 (https://github.com/pytorch/pytorch/commit/5b0a6482c185428c41d8772a92fa0b77174821fc) that we didn't catch the first time around.
This (rebased on top of the to_copy fixes) fixes the converted remote_ro c2/pt output comparison
Test Plan:
```
MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 numactl -m 0 -C 3 ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --c2_model=/data/users/ansha/tmp/adfinder/210494966_0.predictor.disagg.remote_request_only --c2_inputs=/data/users/ansha/tmp/adfinder/models/c2_remote_ro_input_data.pb --pred_net=/data/users/ansha/tmp/adfinder/models/c2_remote_ro_net2.pb --c2_sigrid_transforms_opt=1 --c2_apply_nomnigraph_passes=1 --c2_use_memonger=1 --scripted_model=/data/users/ansha/tmp/adfinder/models_dianshi/210494966_0.predictor.disagg.remote_request_only.pt --pt_inputs=/data/users/ansha/tmp/adfinder/models/remote_ro_wrapped_input_data.pt --pt_enable_static_runtime=1 --pt_cleanup_activations=1 --pt_enable_out_variant=1 --compare_results=1 --iters=1 --warmup_iters=1 --num_threads=1 --do_profile=0 --benchmark_c2_predictor=1 --do_benchmark=1
```
Reviewed By: hlu1
Differential Revision: D27477104
fbshipit-source-id: 5a95dfa7eae23566fadc3fec323ad03a34e6734d