Enable float only requantization. Part 1. (#35856)
Summary:
This PR is motivated by two issues it tries to address:
1) relax the constraint on requantization scale (<1).
2) Unify requantization methodology across pytorch integration of QNNPACK and FBGEMM.
Here we are trying to address the first part for Conv and Linear.
Existing requantization scheme performs scale multiplication entirely in integer arithmetic by extracting mantissa and exponent part of FP scale and processing them. This including appropriate rounding required. The set of instruction, corresponding to this, are specifically tailored for the condition when scale < 1.
Relaxing this constraint requires us to fix that sequence of instruction. In this PR we take a simpler approach of essentially converting Int32 to FP32, apply scale, convert FP32 to Int32 with appropriate rounding, to-nearest-ties-to-even. This is followed by zero point add and clipping. Since in 32-bit ARM nearest-ties-to-even rounding instruction is not available, the sequence is little different. Sequence for both 32-bit and 64-bit are taken from https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/quantized/cpu/qnnpack/src/requantization/fp32-neon.c.
Furthermore relaxing the scale constraint and moving towards FP requantization also helps us move towards unifying requantization producer across QNNPACK and FBGEMM.
Summary of the PR:
- requantization params are modified to lift some computation that would have to be in the kernel otherwise for aarch32 kernels, particularly:
- Computing vfmin, vfmax, vfmagic and vimagic.
- Fixed q8gemm, q8conv and q8dwconv kernels.
- Fixed the corresponding tests.
What is not done:
- XZP kernels are not changed as part of this PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35856
Differential Revision: D20996325
Pulled By: kimishpatel
fbshipit-source-id: 7a7a18b09dd2564768142371db06d98bf7479f49