pytorch
d56adb1b - [quant][pt2e][refactor] Cleanup the logic for deciding whether to insert observer/fq or not (#99220)

Commit

1 year ago

[quant][pt2e][refactor] Cleanup the logic for deciding whether to insert observer/fq or not (#99220) Summary: Previously we have two places we need to decide whether to insert observer or fake quantizer or not: (1) input arguments of a node (2) output of a node, and right now we have separate code to do this in this PR, the logic is unified in `_needs_obs_or_fq` helper function that takes the target_dtype and is_dynamic from previous output and target_dtype and is_dynamic for the current Tensor we are looking at let's use an example for conv node: ``` conv = convolution(input, weight, bias, ...) ``` let's say we have `input_node` object for argument `input`, and `conv_node` for `conv` node in the graph (1) input arguments, e.g. `input` the target_dtype/is_dynamic from previous output is the node that produces `input`, we get this from input_node.meta["target_dtype_info"]["output_act_obs_or_fq"] the taregt_dtype/is_dynamic for the current argument `input`, comes from conv_node.meta["target_dtype_info"]["input_act_obs_or_fq"] similarly for weight it comes from conv_node.meta["target"]["weightobs_or_fq"] etc. (2) output for conv node the target_dtype/is_dynamic from previous output will be the floating point output from the fp32 convolution operator, so it is hardcoded to be (torch.float, False), however, technically we should get this from node.meta["val"], but since the current code base is shared by fx graph mode quantization and pytorch 2.0 export quantization, we cannot do that, we can revisit after we decide to deprecate fx graph mode quantization the target_dtype/is_dynamic for the current output comes from conv_node.meta["target_dtype_info"]["output_act_obs_or_fq"] there is one caveat here about dynamic quantization, that is explained in the comment, so I won't repeat here Note: also fixed some places in `_get_arg_target_dtype_as_input_to_node` and `_get_arg_target_is_dynamic_as_input_to_node` to make sure "not specified" == specifying a fp32 placeholder observer as well Next: we can merge the two get target dtype and get is_dynamic function to reduce code duplication Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps python test/test_quantization.py TestQuantizeFxModels python test/test_quantization.py TestQuantizePT2E python test/test_quantization.py TestQuantizePT2EModels Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D45167585](https://our.internmc.facebook.com/intern/diff/D45167585) Pull Request resolved: https://github.com/pytorch/pytorch/pull/99220 Approved by: https://github.com/kimishpatel

Author

jerryzh168

Committer

pytorchmergebot

Parents

06081ac8

pytorch d56adb1b - [quant][pt2e][refactor] Cleanup the logic for deciding whether to insert observer/fq or not (#99220)

pytorch
d56adb1b - [quant][pt2e][refactor] Cleanup the logic for deciding whether to insert observer/fq or not (#99220)