Per discussion at https://github.com/pytorch/pytorch/pull/21244, fix bugs in (#21392)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21392
as discussed at https://github.com/pytorch/pytorch/pull/21244, we
found some values in log_beta are not properly initialized. This diff will 1)
initialize all log_beta to -inf; 2) fix a tricky compare condition; 3) zero all
the gradient elements corresponding to padding to zero.
Offline experiments show that this diff can fix previous seen NaN loss.
Differential Revision: D15637977
fbshipit-source-id: 477008a5e11aae946bd2aa401ab7e0c513421af0