transformers
1f33023c
- Flash-attn performance: remove cuda sync during inference (#33570)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
1 year ago
Flash-attn performance: remove cuda sync during inference (#33570) Switch conditions to use short-circuit during inference
References
#33570 - Flash-attn performance: remove cuda sync during inference
Author
Cyrilvallez
Parents
4953ddf0
Loading