llama.cpp
ggml : use F16 instead of F32 in Q4_0, Q4_1, Q8_0
#1508
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
5
Changes
View On
GitHub
ggml : use F16 instead of F32 in Q4_0, Q4_1, Q8_0
#1508
ggerganov
merged 5 commits into
master
from
qnt-f16
ggml : use F16 instead of F32 in Q4_0, Q4_1 and Q8_0
d627025c
ggerganov
added
performance
ggerganov
added
breaking change
llama : bump LLAMA_FILE_VERSION to 3
3094f642
cuda : update Q4 and Q8 dequantize kernels
8b713297
ggml : fix AVX dot products
a4434975
readme : update performance table + hot topics
c6d82555
ggerganov
merged
2d5db483
into master
3 years ago
ggerganov
deleted the qnt-f16 branch
3 years ago
Login to write a write a comment.
Login via GitHub
Reviewers
No reviews
Assignees
No one assigned
Labels
performance
breaking change
Milestone
No milestone
Login to write a write a comment.
Login via GitHub