transformers
68a894a5 - Fix uninitialized parameter in conformer relative attention. (#18368)

Commit

3 years ago

Fix uninitialized parameter in conformer relative attention. (#18368) `torch.Tensor` creates an unitialized tensor (as via `torch.empty`), this leads to undeterministic behavior, poor initialization, and nans if you have unlucky init. The paper does not specify the initialization for bias terms, so I guess zero seems like a good choice - no bias initially. `torch.Tensor` is usually populated with zeros, so this fix will be close to the intended behavior: ``` >>> torch.Tensor(100, 100).sum() tensor(0.) >>> torch.Tensor(100, 100).sum() tensor(nan) >>> torch.Tensor(100, 100).sum() tensor(0.) ```

References

#18368 - Fix uninitialized parameter in conformer relative attention.

Author

Piotr Dabkowski

Parents

df5e4232

transformers 68a894a5 - Fix uninitialized parameter in conformer relative attention. (#18368)

transformers
68a894a5 - Fix uninitialized parameter in conformer relative attention. (#18368)