Commit
4 years ago
Update arguments checks. hidden_size % attention_heads == 0 is handled above when dealing with kv_channels. Adding check for decoder sequence length.
Author
Parents
Loading