optimize FloatToFused8BitRowwiseQuantized and Fused8BitRowwiseQuantizedToFloat (#31470)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31470
Optimize performance of these two operators.
Additionally use nearbyint instead of round to be consistent with 4-bit embedding table quantization.
Reviewed By: hyuen
Differential Revision: D19072103
fbshipit-source-id: efe96f14aeff7958cceb453ed625d3fd693891ff