MinMax based observers: respect device affinity for state_dict (#44537)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44537
Originally, the `min_val`, `max_val`, `min_vals`, `max_vals`
attributes of observers were Tensors but not buffers. They had custom
state_dict save/load code to ensure their state was saved.
At some point, these attributes became buffers, and the custom
save/load code remained. This introduced a subtle bug:
* create model A, move it to a device (cpu/cuda) and save its state_dict
* create model B, load its state dict.
* `min_val|min_vals|max_val|max_vals` would always be loaded to model A's device, even if the rest of model B was on a different device
* the above is inconsistent with how save/load on different devices is expected to work (see https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-across-devices)
In practice, the case people would sometimes hit is:
* model A is on CPU, state dict is saved
* model B is created and moved to GPU, state_dict from model A is loaded
* assertions throw when operations are attempted across different devices
This PR fixes the behavior by removing the custom save/load where
possible and letting the default `nn.Module` save/load code handle
device assignment. We special case `PerChannelMinMaxObserver` and its
children to allow for loading buffers or different size, which is
normal.
There are some followups to also enable this for HistogramObserver
and FakeQuantize, which can be done in separate PRs due to higher
complexity.
Test Plan:
```
python test/test_quantization.py TestObserver.test_state_dict_respects_device_affinity
```
Imported from OSS
Reviewed By: raghuramank100
Differential Revision: D23644493
fbshipit-source-id: 0dbb6aa309ad569a91a663b9ee7e44644080032e