[quant] Add QuantizedMHA class (#79956)
The nn.MultiheadAttention is quantized through the custom module mechanism, which uses the nn.quantizable.MultiheadAttention for both observed and quantized paths. This is potentially a source of confusion. This creates a quantized.MultiheadAttention class, which completely takes the quantized path. Note that after this, the old usage will throw an error.
New way of using it:
```
>>> custom_module_config = {
... 'float_to_observed_custom_module_class': {
... nn.MultiheadAttention: nn.quantizable.MultiheadAttention,
... },
... 'observed_to_quantized_custom_module_class': {
... nn.quantizable.MultiheadAttention: nn.quantized.MultiheadAttention,
... }
... }
>>> tq.prepare(model, prepare_custom_module_class=custom_module_config)
>>> tq.convert(model, convert_custom_module_class=custom_module_config)
```
due to weird CI issues with previous PR,
old discussion can be found: https://github.com/pytorch/pytorch/pull/71190
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79956
Approved by: https://github.com/z-a-f