changed to proper Xavier initialization, existing implementation was … (#1927)
Summary:
…resulting in a large negative bias, which was killing all gradients through the following relu. https://paperswithcode.com/method/xavier-initialization
Pull Request resolved: https://github.com/pytorch/benchmark/pull/1927
Reviewed By: davidberard98
Differential Revision: D49754019
Pulled By: xuzhao9
fbshipit-source-id: 436676afed9bcc0f464cd1b25465444a98a52b5a