llama.cpp
b1a5bd4e - CUDA: better coalesce data-access for contiguous concat (#22330)

Commit
22 days ago
CUDA: better coalesce data-access for contiguous concat (#22330) Also, distribute all elements across CTAs evenly instead of launching one CTA per dim
Author
Parents
Loading