CUDA performance optimizations #1530
ggerganov
approved these changes
on 2023-05-21
xor hack
fbf5588a
block y dim
1a787101
loop unrolling
82cf01f8
Fixed cmake LLAMA_CUDA_BY option
17dc4c52
Removed hipblas compatibility code
5d0cf992
Define GGML_CUDA_DMMV_BLOCK_Y if not defined
e199938a
Fewer iters, more ops per iter
98bfee01
ggerganov
approved these changes
on 2023-05-23
Renamed DMMV X/Y compilation options
d45df1b1
ggerganov
merged
1fcdcc28
into master 2 years ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub