Cuda Support for Learnable Fake Quantize Per Channel (GPU) (#41262)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41262
In this diff, implementation is provided to support the GPU kernel running the learnable fake quantize per tensor kernels.
Test Plan: On a devvm, run `buck test //caffe2/test:quantization -- learnable` to test both the forward and backward for the learnable per tensor fake quantize kernels. The test will test the `cuda` version if a gpu is available.
Reviewed By: vkuzo
Differential Revision: D22478832
fbshipit-source-id: 2731bd8b57bc83416790f6d65ef42d450183873c