llama.cpp
CUDA performance optimizations
#1530
Merged

CUDA performance optimizations #1530

JohannesGaessler
SlyEcho
bluefireexplosion
JohannesGaessler JohannesGaessler added performance
JohannesGaessler
JohannesGaessler commented on 2023-05-20
ggerganov
JohannesGaessler
JohannesGaessler
JohannesGaessler
howard0su
howard0su commented on 2023-05-20
SlyEcho
ggerganov
ggerganov commented on 2023-05-20
JohannesGaessler JohannesGaessler force pushed to b00c58c3 2 years ago
SlyEcho
JohannesGaessler
ggerganov
JohannesGaessler JohannesGaessler force pushed 2 years ago
JohannesGaessler
JohannesGaessler
ggerganov
ggerganov approved these changes on 2023-05-21
JohannesGaessler xor hack
fbf5588a
JohannesGaessler block y dim
1a787101
JohannesGaessler loop unrolling
82cf01f8
JohannesGaessler Fixed cmake LLAMA_CUDA_BY option
17dc4c52
JohannesGaessler Removed hipblas compatibility code
5d0cf992
JohannesGaessler Define GGML_CUDA_DMMV_BLOCK_Y if not defined
e199938a
JohannesGaessler Fewer iters, more ops per iter
98bfee01
JohannesGaessler JohannesGaessler force pushed to 3698cd08 2 years ago
ggerganov
ggerganov approved these changes on 2023-05-23
JohannesGaessler Renamed DMMV X/Y compilation options
d45df1b1
JohannesGaessler JohannesGaessler force pushed from 3698cd08 to d45df1b1 2 years ago
ggerganov
JohannesGaessler
ggerganov ggerganov merged 1fcdcc28 into master 2 years ago
KerfuffleV2 KerfuffleV2 assigned KerfuffleV2 KerfuffleV2 2 years ago
KerfuffleV2 KerfuffleV2 unassigned KerfuffleV2 KerfuffleV2 2 years ago
KerfuffleV2
SlyEcho

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone