[quant] Quantizable MultiheadAttention (#49866)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49866
- Adds the `torch.nn.quantizable.MultiheadAttention`
The quantizable version can serve as a fully equivalent to `torch.nn.MultiheadAttention` module.
The main difference is that it allows for linear units observation after the `prepare` step in the quantization flow.
Note: The `from_observed` method (called during the `convert`) removes the `bias_k` and `bias_v` parameters, and resets them as attributes.
This is done to avoid an error of assigning a quantized tensor to the `torch.nn.Parameter`.
(Note: this ignores all push blocking failures!)
Test Plan:
```
python test/test_quantization.py TestQuantizedOps.test_custom_module_multi_head_attention
```
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D25706179
fbshipit-source-id: e27ab641d8d1eccc64cf9e44343459331f89eea4