llama.cpp
eaa13a48
- falcon : fix CUDA inference by making K and Q contiguous (#2830)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
2 years ago
falcon : fix CUDA inference by making K and Q contiguous (#2830) * falcon : fix CUDA inference by making K and Q contiguous ggml-ci * cuda : add assert to guard from non-cont ropes
References
#2830 - falcon : fix CUDA inference by making K and Q contiguous
Author
ggerganov
Parents
da7455d0
Loading