transformers
ee5de0ba - BERT decoder: Fix causal mask dtype.

Commit
5 years ago
BERT decoder: Fix causal mask dtype. PyTorch < 1.3 requires multiplication operands to be of the same type. This was violated when using default attention mask (i.e., attention_mask=None in arguments) given BERT in the decoder mode. In particular, this was breaking Model2Model and made tutorial from the quickstart failing.
Author
Committer
Parents
Loading