[quant] Add Graph Mode Passes to quantize EmbeddingBag operators (#41612)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41612
This change adds preliminary support to quantize the EmbeddingBag operators. We currently support 4-bit and 8-bit quantization+packing of the weights.
To quantize these operators, specify the operator name in the `custom_op_name` field of the NoopObserver. Based on the op name (4bit or 8bit) we call the corresponding quantization functions.
Refer to the testplan for how to invoke the qconfig for the embedding_bag ops.
Future versions of this will support 4-bit and 2-bit qtensors with native support to observe and quantize it.
NB - This version assumes that the weights in the EmbeddingBag Module reside on the same device.
Test Plan:
python test/test_quantization.py TestQuantizeDynamicJitOps.test_embedding_bag
Imported from OSS
Reviewed By: vkuzo, jerryzh168
Differential Revision: D22609342
fbshipit-source-id: 23e33f44a451c26719e6e283e87fbf09b584c0e6