[Person Seg] Compress the person seg model (#48008)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48008
### Motivation
The idea is to quantize the weights during model exporting and dequantize them when performing setstate in runtime. To replicate exactly what caffe2 did, only 10 conv layers were quantized.
Since the code here is restricted to the unet model, I created a custom prepacking context to do the graph rewriting and registering custom ops.
To run on iOS/MacOS, we need to link `unet_metal_prepack` explicitly.
- buck build //xplat/caffe2/fb/custom_ops/unet_metal_prepack:unet_metal_prepackApple
- buck build //xplat/caffe2/fb/custom_ops/unet_metal_prepack:unet_metal_prepackAppleMac
On the server side, the `unet_metal_prepack.cpp` needs to be compiled into the `aten_cpu` in order to do the graph-rewrite via optimize_for_mobile. However, since we don't want to ship it to the production, some local hacks were made to make this happen. More details can be found in the following diffs.
### Results
-rw-r--r-- 1 taox staff 1.1M Nov 10 22:15 seg_init_net.pb
-rw-r--r-- 1 taox staff 1.1M Nov 10 22:15 seg_predict_net.pb
Note since we quantize the weights, some precision loss are expected, but overall good.
### ARD
- Person seg - v229
- Hair seg - v105
ghstack-source-id: 117019547
Test Plan:
### Video eval results from macos
{F345324969}
Differential Revision: D24881316
fbshipit-source-id: b67811d6d06de82130f4c22392cc961c9dda7559