llama.cpp
1.5 bit quantization
#5453
Merged

1.5 bit quantization #5453

ggerganov merged 15 commits into master from ik/iq1_s
ikawrakow
ikawrakow ikawrakow added demo
Nexesenex
ikawrakow ikawrakow force pushed from 5c4cb8dd to a4982831 1 year ago
ikawrakow
Nexesenex
ikawrakow
Nexesenex
ikawrakow
slaren
Nexesenex
Nexesenex
slaren
Nexesenex
cebtenzzre
ikawrakow ikawrakow force pushed from 9803f7ad to 584b3692 1 year ago
ikawrakow
Nexesenex
InvincibleDude
Artefact2
Nexesenex
ghchris2021
benxh1995
Nexesenex
ikawrakow
ggerganov
iq1_s: WIP basics
80cd5bae
iq1_s: CUDA is working
a9d48e97
iq1_s: scalar CPU dot product
d94139bf
iq1_s: WIP AVX2 dot product - something is not right
592b3b26
Fix tests
5574533a
Fix shadow warnings
dc0b14be
Fix after merge with latest master
67e7c423
iq1_s: AVX2 finally works
2ffb05ac
iq1_s: ARM_NEON dot product. Works, but not very fast
77301492
iq1_s: better grid
307c5f61
iq1_s: use IQ2_XXS for attn_output
4be44b7c
ikawrakow ikawrakow force pushed from 584b3692 to 4be44b7c 1 year ago
iq1_s: Metal basics
020b548e
iq1_s: Metal works, but quite slow
425c6bbb
iq1_s: Tests
f604a179
iq1_s: slightly faster dot product
5c977221
ikawrakow ikawrakow added enhancement
ikawrakow ikawrakow marked this pull request as ready for review 1 year ago
tsengalb99
ikawrakow
ikawrakow ikawrakow requested a review from ggerganov ggerganov 1 year ago
ggerganov
ggerganov approved these changes on 2024-02-18
ggerganov ggerganov merged bd2d4e39 into master 1 year ago
Artefact2
Artem-B
Artem-B commented on 2024-02-18
BarfingLemurs
BadisG
ikawrakow
tsengalb99
ikawrakow
tsengalb99
ikawrakow
Nexesenex
mofosyne mofosyne added Tensor Encoding Scheme
mofosyne mofosyne added Review Complexity : High
ghostplant

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone