Working POC of define-by-run quantization (#64676)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64676
We implement a working eager mode quantization flow which uses
tracing and `__torch_function__` and `torch.nn.Module.__call__` overrides to automate the model modifications needed for quantization. Partial program capture (instead of full program capture) is used, allowing this scheme to target a wide variety of user programs. Control flow over quantizeable ops is not supported, but general control flow is supported.
In particular:
* `auto_trace.py` contains the machinery to override `__torch_function__` and `torch.nn.Module.__call__` and call hooks before and after each quantizeable module or function
* `quantization_state.py` contains the state needed to use the hooks to implement quantization logic such as adding quants/dequants, observers, etc.
* please see `README.md` for more details
Test Plan:
```
python test/test_quantization.py TestAutoTracing
python test/test_quantization.py TestAutoTracingModels
```
```
python test/test_quantization.py TestAutoTracing
python test/test_quantization.py TestAutoTracingModels
```
Differential Revision:
D31992281
D31992281
Reviewed By: HDCharles
Pulled By: vkuzo
fbshipit-source-id: 6d40e855f3c96b9a4b637a0e677388a7b92f7967