llama.cpp
34c9d765 - CUDA: add attention sinks for tile and wmma (#15178)

Commit

29 days ago

CUDA: add attention sinks for tile and wmma (#15178) * CUDA: add attention sinks for tile and wmma * Review: formatting changes + remove syncthreads from tile + remove warp_reduce_max from wmma

References

#15178 - CUDA: add attention sinks for tile and wmma

Author

am17an

Parents

e54d41be

llama.cpp 34c9d765 - CUDA: add attention sinks for tile and wmma (#15178)

llama.cpp
34c9d765 - CUDA: add attention sinks for tile and wmma (#15178)