[quant] Release qnnpack original weights for conv/linear (#37595)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37595
QNNPACK currently does not support an unpack function. So we store the original weights in the packed structure which is directly returned to the user when unpack is called.
However for memory constrained environments (like mobile), storing these extra weights in memory is expensive. We need to release these weights after packing on mobile to free up the memory. As a side-effect user cannot call unpack on mobile once the model is run.
The change is gated by C10_MOBILE which is enabled for mobile builds.
The change saves 36MB on device for Speech Model.
Test Plan:
python test/test_quantization.py
Imported from OSS
Differential Revision: D21365495
fbshipit-source-id: 66465ea0b4a10d44187d150edfb90d989e872b65