[quant][graphmode] Observing input/output values in call site (#33277)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33277
Currently we insert observer in the called graph, which is incorrect since graphs can be shared
and the decision of whether to insert observer or not might dependend on where the graph is called.
For example, for a call sequence `self.conv1(self.conv2(x))`, we can't inserting observer correctly
if `self.conv1` and `self.conv2` are sharing the same type in the current implementation, because we insert
observer in the graph of the forward method of Conv2d right now and this call sequence requires us to insert
only one observer for the output of self.conv1/input of self.conv2.
We'll need to insert observers for input/output values of the graph in call site instead.
Test Plan:
python test/test_jit.py
Imported from OSS
Differential Revision: D20208787
fbshipit-source-id: 739e1d877639c0d0ed24e573bbd36211defa6836