llama.cpp
b1a5bd4e
- CUDA: better coalesce data-access for contiguous concat (#22330)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
22 days ago
CUDA: better coalesce data-access for contiguous concat (#22330) Also, distribute all elements across CTAs evenly instead of launching one CTA per dim
References
#22330 - CUDA: better coalesce data-access for contiguous concat
Author
ORippler
Parents
0c6ee1ca
Loading