disableing quantizing gradient in 8bw (#101739)
Summary:
Quantizing a *gradient* is not applicable to complex ASR model.
Gradient in INT8
f438266519
Gradient in FP32
f438109197
Clearly two WER shows the limitation with quantizing a gradient.
As of now, we are okay with simply enabling quantized backpropagation but computing gradient in FP32.
It already saves a memory due to model size.
Test Plan: Signals
Differential Revision: D45965552
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101739
Approved by: https://github.com/izaitsevfb