transformers
205bc415 - Fix GPT-NeoX-20B past handling, attention computation (#17811)

Commit
3 years ago
Fix GPT-NeoX-20B past handling, attention computation (#17811) * Fix GPT-NeoX-20B past handling, swap attention computation to hopefully avoid NaN, update docs * 20B tests
Author
Parents
Loading