llama.cpp
ee1d670c
- parallel : fix bug (extra BOS) + smaller token_prev array
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
1 year ago
parallel : fix bug (extra BOS) + smaller token_prev array
References
#3228 - llama : custom attention mask + parallel decoding + no context swaps
Author
ggerganov
Parents
1be2b8c1
Loading