llama.cpp
34c9d765
- CUDA: add attention sinks for tile and wmma (#15178)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
29 days ago
CUDA: add attention sinks for tile and wmma (#15178) * CUDA: add attention sinks for tile and wmma * Review: formatting changes + remove syncthreads from tile + remove warp_reduce_max from wmma
References
#15178 - CUDA: add attention sinks for tile and wmma
Author
am17an
Parents
e54d41be
Loading