llama.cpp
GPU-accelerated token generation (new quantization format)
#1412
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
9
Changes
View On
GitHub
Commits
CUDA kernel for q4_0 dequant. + mat. vec. mult.
JohannesGaessler
committed
2 years ago
Added q4_1 via template
JohannesGaessler
committed
2 years ago
Added missing __syncthreads();
JohannesGaessler
committed
2 years ago
--gpu_layers -> --gpu-layers
JohannesGaessler
committed
2 years ago
Shorter dequantize_mul_mat_vec line
JohannesGaessler
committed
2 years ago
q5_0 dequantize_mul_mat kernel
JohannesGaessler
committed
2 years ago
More readable dequantize_mul_mat_vec logic
JohannesGaessler
committed
2 years ago
dequantize_mul_mat_vec kernels for q5_1, q8_0, f16
JohannesGaessler
committed
2 years ago
llama : offload "output" tensor to GPU too + coding style fixes
ggerganov
committed
2 years ago
Loading