pytorch
3bcec850 - [quant] Add QuantizedMHA class (#79956)

Commit

3 years ago

[quant] Add QuantizedMHA class (#79956) The nn.MultiheadAttention is quantized through the custom module mechanism, which uses the nn.quantizable.MultiheadAttention for both observed and quantized paths. This is potentially a source of confusion. This creates a quantized.MultiheadAttention class, which completely takes the quantized path. Note that after this, the old usage will throw an error. New way of using it: ``` >>> custom_module_config = { ... 'float_to_observed_custom_module_class': { ... nn.MultiheadAttention: nn.quantizable.MultiheadAttention, ... }, ... 'observed_to_quantized_custom_module_class': { ... nn.quantizable.MultiheadAttention: nn.quantized.MultiheadAttention, ... } ... } >>> tq.prepare(model, prepare_custom_module_class=custom_module_config) >>> tq.convert(model, convert_custom_module_class=custom_module_config) ``` due to weird CI issues with previous PR, old discussion can be found: https://github.com/pytorch/pytorch/pull/71190 Pull Request resolved: https://github.com/pytorch/pytorch/pull/79956 Approved by: https://github.com/z-a-f

Author

HDCharles

Committer

pytorchmergebot

Parents

af4e2b2c

pytorch 3bcec850 - [quant] Add QuantizedMHA class (#79956)

pytorch
3bcec850 - [quant] Add QuantizedMHA class (#79956)