[coreml] Introducing Quantization (#78108)
Summary: Adding Quantization mode to preprocess, which allows us to run through quantization for coreml models
Test Plan:
https://fburl.com/anp/r0ntsbq0
Notebook runnining through quantization workflow:
created a custom bentos kernel to run it through coreml
```bento_kernel(
name = "coreml",
deps = [
"fbsource//third-party/pypi/coremltools:coremltools",
"//caffe2:coreml_backend",
"//caffe2:coreml_backend_cpp",
"//caffe2:torch",
"//caffe2/torch/fb/mobile/model_exporter:model_exporter",
],
)
```
Initial benchmarks on iPhone 11:
FP32 Core ML Model:
https://our.intern.facebook.com/intern/aibench/details/203998485252700
Quantized Core ML Model:
https://our.intern.facebook.com/intern/aibench/details/927584023592505
High End Quantized Model:
https://our.intern.facebook.com/intern/aibench/details/396271714697929
Summarized Results
| Backend | Quantization | p50 net latency | Model Size |
|---------|--------------|-----------------|------------|
| Core ML | No | 1.2200 | 1.2mb |
| Core ML | Yes | 1.2135 | 385kb |
| CPU | Yes | 3.1720 | 426kb |
Reviewed By: SS-JIA
Differential Revision: D36559966
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78108
Approved by: https://github.com/jmdetloff