SemanticDiff

pytorch
b8584b88 - [quant] Quantizable MultiheadAttention (#49866)

Commit View On GitHub

Login via GitHub
Home
Pricing
FAQ
Install

Login via GitHub

Commit

3 years ago

[quant] Quantizable MultiheadAttention (#49866) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49866 - Adds the `torch.nn.quantizable.MultiheadAttention` The quantizable version can serve as a fully equivalent to `torch.nn.MultiheadAttention` module. The main difference is that it allows for linear units observation after the `prepare` step in the quantization flow. Note: The `from_observed` method (called during the `convert`) removes the `bias_k` and `bias_v` parameters, and resets them as attributes. This is done to avoid an error of assigning a quantized tensor to the `torch.nn.Parameter`. (Note: this ignores all push blocking failures!) Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_custom_module_multi_head_attention ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D25706179 fbshipit-source-id: e27ab641d8d1eccc64cf9e44343459331f89eea4

Author

Zafar Takhirov

Committer

facebook-github-bot

facebook-github-bot

Parents

FAQ Terms Privacy Refunds Impressum

Loading