llama.cpp
5.5x more CUDA performance with 5 minutes of work
#2140
Merged

5.5x more CUDA performance with 5 minutes of work #2140

JohannesGaessler
JohannesGaessler CUDA: add __restrict__ to mul mat vec kernels
c8abd83c
slaren
slaren approved these changes on 2023-07-07
JohannesGaessler JohannesGaessler merged 061f5f8d into master 2 years ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone