pytorch
ec713c0e - [Pytorch] Improve scale and zero point extraction for per channel quantized (#53726)

Commit

3 years ago

[Pytorch] Improve scale and zero point extraction for per channel quantized (#53726) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53726 In quantized linear layers, during deserialization we create scales and zero points which are later used for qnnpack kernels. Scales and zero pointer extraction for per channel quantized tensors is slow. This is due to the fact that we index directly into zero point and scales tensor and this indexing creates a tensor slice of 1 element which is then cast to int32 or float. This is super slow and increases model loading time. This diff fixes that. Test Plan: CI Reviewed By: raziel Differential Revision: D26922138 fbshipit-source-id: b78e8548f736e8fa2f6636324ab1a2239b94a27c

Author

kimishpatel

Committer

facebook-github-bot

Parents

d7b5a6fa

pytorch ec713c0e - [Pytorch] Improve scale and zero point extraction for per channel quantized (#53726)

pytorch
ec713c0e - [Pytorch] Improve scale and zero point extraction for per channel quantized (#53726)