Support per channel quantization in insert_quant_dequant and fold_prepack (#29492)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29492
Previously graph mode quantization only works for per tensor quantization,
this PR added support for per channel quantization as well, changes include
- insert per channel quantization calls(insert_quant_dequant)
- add support of folding for prepacked per channel quantized weight (fold_prepack)
Test Plan:
test is not possible until we can script PerChannelObserver, which comes in https://github.com/pytorch/pytorch/pull/29416
we'll add test in a separate PR after that.
Imported from OSS
Differential Revision: D18580444
fbshipit-source-id: 347c07f201648ec49f070523642a9170278f8aa4