transformers
a4ec22ae - Most probably explicit register of the attention classes to prevent holding the references in the decoder layer. Will do that in a bit

Commit
228 days ago
Most probably explicit register of the attention classes to prevent holding the references in the decoder layer. Will do that in a bit
Author
Parents
Loading