llama.cpp
GPU-accelerated token generation (new quantization format)
#1412
Merged

Commits
  • CUDA kernel for q4_0 dequant. + mat. vec. mult.
    JohannesGaessler committed 2 years ago
  • Added q4_1 via template
    JohannesGaessler committed 2 years ago
  • Added missing __syncthreads();
    JohannesGaessler committed 2 years ago
  • --gpu_layers -> --gpu-layers
    JohannesGaessler committed 2 years ago
  • Shorter dequantize_mul_mat_vec line
    JohannesGaessler committed 2 years ago
  • q5_0 dequantize_mul_mat kernel
    JohannesGaessler committed 2 years ago
  • More readable dequantize_mul_mat_vec logic
    JohannesGaessler committed 2 years ago
  • dequantize_mul_mat_vec kernels for q5_1, q8_0, f16
    JohannesGaessler committed 2 years ago
  • llama : offload "output" tensor to GPU too + coding style fixes
    ggerganov committed 2 years ago
Loading