llama.cpp
GPU-accelerated token generation (new quantization format)
#1412
Merged

GPU-accelerated token generation (new quantization format) #1412

JohannesGaessler
JohannesGaessler
Folko-Ven
JohannesGaessler
JohannesGaessler
SlyEcho
ggerganov
ggerganov commented on 2023-05-12
ggerganov
ggerganov commented on 2023-05-12
JohannesGaessler
github-actions
github-actions commented on 2023-05-12
JohannesGaessler CUDA kernel for q4_0 dequant. + mat. vec. mult.
637be12f
JohannesGaessler Added q4_1 via template
12fc292e
JohannesGaessler Added missing __syncthreads();
7dc2f57e
JohannesGaessler --gpu_layers -> --gpu-layers
f0af4757
JohannesGaessler JohannesGaessler force pushed from 1e735d2d to f0af4757 2 years ago
SlyEcho
JohannesGaessler
SlyEcho
SlyEcho commented on 2023-05-12
ggerganov
slaren
slaren commented on 2023-05-12
slaren
Green-Sky
SlyEcho
JohannesGaessler
JohannesGaessler Shorter dequantize_mul_mat_vec line
0986c2f4
JohannesGaessler q5_0 dequantize_mul_mat kernel
9da44fdc
JohannesGaessler More readable dequantize_mul_mat_vec logic
5a0ecf76
JohannesGaessler dequantize_mul_mat_vec kernels for q5_1, q8_0, f16
bb0993ed
JohannesGaessler JohannesGaessler force pushed from 991ef9e6 to bb0993ed 2 years ago
JohannesGaessler JohannesGaessler marked this pull request as ready for review 2 years ago
JohannesGaessler
JohannesGaessler JohannesGaessler added performance
SlyEcho
Green-Sky
ggerganov llama : offload "output" tensor to GPU too + coding style fixes
ad8a9e69
ggerganov
ggerganov approved these changes on 2023-05-13
ggerganov ggerganov merged 905d87b7 into master 2 years ago
github-actions
github-actions commented on 2023-05-13
slaren
slaren commented on 2023-05-13
JohannesGaessler
kevkid
Green-Sky
kevkid
moejay
Green-Sky
kevkid
Green-Sky
JohannesGaessler
Green-Sky
CRD716
regstuff
JohannesGaessler
zakkor
Green-Sky

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone