llama.cpp
ggml : use F16 instead of F32 in Q4_0, Q4_1, Q8_0
#1508

Merged

ggml : use F16 instead of F32 in Q4_0, Q4_1, Q8_0 #1508

ggerganov merged 5 commits into master from qnt-f16

ggml : use F16 instead of F32 in Q4_0, Q4_1 and Q8_0

d627025c

ggerganov added performance

ggerganov added breaking change

llama : bump LLAMA_FILE_VERSION to 3

3094f642

cuda : update Q4 and Q8 dequantize kernels

8b713297

ggml : fix AVX dot products

a4434975

readme : update performance table + hot topics

c6d82555

ggerganov merged 2d5db483 into master 3 years ago

ggerganov deleted the qnt-f16 branch 3 years ago

Reviewers

No reviews

Assignees

No one assigned

Labels

performance breaking change

Milestone

No milestone