Replace empty_affine_quantizer with new_qtensor_cpu. (#36814)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36814
ghstack-source-id: 103218412
From the flamegraph it seems 40% the time we are spending going through the dispatch stack. I think in quantized model where compute can take less time, such overheads become noticeable
{F234432545}
Test Plan: Quantized op tests.
Reviewed By: jerryzh168
Differential Revision: D21093840
fbshipit-source-id: 1b98b57eae403353596fc31171069d2f43b13385