Cuda Support for Learnable Fake Quantize Per Tensor (GPU) (#41127)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41127
In this diff, implementation is provided to support the GPU kernel running the learnable fake quantize per tensor kernels.
Test Plan: On a devvm, run `buck test //caffe2/test:quantization -- learnable` to test both the forward and backward for the learnable per tensor fake quantize kernels. The test will test the `cuda` version if a gpu is available.
Reviewed By: z-a-f
Differential Revision: D22435037
fbshipit-source-id: 515afde13dd224d21fd47fb7cb027ee8d704cbdd