llama.cpp
Add Q8_0 quantization for intermediate results
#951
Merged

Add Q8_0 quantization for intermediate results #951

ggerganov merged 7 commits into master from q8_0
ggerganov
ggerganov ggerganov added help wanted
ggerganov ggerganov added high priority
ggerganov ggerganov added generation quality
ggerganov
ggerganov commented on 2023-04-13
ggerganov ggerganov requested a review from sw sw 2 years ago
sw
sw commented on 2023-04-13
sw
sw commented on 2023-04-13
sw
sw commented on 2023-04-13
sw
sw commented on 2023-04-13
ggerganov
howard0su
slaren
ggerganov ggerganov assigned ggerganov ggerganov 2 years ago
sw
ggerganov ggerganov force pushed to 05bf3ab6 2 years ago
ggerganov
howard0su
ggerganov ggml : add Q8_0 quantization for intermediate results
3b894ec6
ggerganov quantize-stats : fix test + add it to Makefile default
19e7a657
sw Q8: use int8_t, AVX/AVX2 optimizations
2c4f9b65
ggerganov ggml : fix quantize_row_q8_0() ARM_NEON rounding
312a927f
ggerganov minor : updates after rebase to latest master
3a111abd
ggerganov quantize-stats : delete obsolete strings
01de5c54
ggerganov ggerganov force pushed from 9056a24e to 01de5c54 2 years ago
sw
sw commented on 2023-04-15
ggerganov ggml : fix q4_1 dot func
60f27ed8
ggerganov
sw
sw approved these changes on 2023-04-15
dfyz
ggerganov
dfyz
sw
ggerganov ggerganov merged e95b6554 into master 2 years ago
ggerganov ggerganov deleted the q8_0 branch 2 years ago
unbounded
ggerganov
dfyz
SebastianApel
unbounded
mofosyne mofosyne added Tensor Encoding Scheme
mofosyne mofosyne added Review Complexity : High

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
Labels
Milestone