llama.cpp
ggml : remove bit shuffling
#1405
Merged

ggml : remove bit shuffling #1405

ggerganov merged 32 commits into master from remove-vzip-2
ggerganov
ggerganov ggerganov force pushed 3 years ago
ggerganov ggerganov marked this pull request as ready for review 3 years ago
ggerganov ggerganov requested a review from sw sw 3 years ago
ggerganov ggml : remove Q4_0 bit shufling (ARM NEON)
5fa47bf6
ggerganov ggml : remove Q4_1 bit shuffling (ARM NEON + reference)
844d2af8
ggerganov ggml : nibbles_from_floats() + bytes_from_nibbles() (ARM NEON)
fd2a137f
ggerganov ggml : remove Q4_2 bit shuffling (WIP, BROKEN)
9f3285f7
ggerganov ggml : remove Q5_0 bit shuffling (ARM NEON)
aa78dfed
ggerganov ggml : 2x faster scalar implementations
b37a08f6
ggerganov ggml : remove Q5_1 bit shuffling (ARM NEON + scalar)
292a778c
ggerganov ggml : simplify scalar dot
caaacd57
ggerganov ggml : remove WASM SIMD bit shuffling + remove vzip for ARM 32-bit
0add6402
ggerganov ggml : fix Q4_1 quantization
9472d0ea
ggerganov ggml : update cuBLAS + normalize variable names
cdc96073
ggerganov ggml : remove Q4_2 mode
4bf1c8a4
ggerganov ggml : minor formatting
b08c39b1
ggerganov ggml : fix Q5_0 quantization
83674556
ggerganov scripts : add script for measuring the time per token
928d2f33
sw AVX implementations (#1370)
9e49d201
ggerganov ggml : uniform 5th bit extraction
489bd13f
ggerganov llama : produce error upon loading old model files
d52172a5
ggerganov llama : fix model magic/version write
09032e02
ggerganov ggml : speed-up Q5_0 + Q5_1 at 4 threads
b7ad385d
ggerganov ggml : preserve old Q4 and Q5 formats
695f3963
ggerganov ggml : simplify Q8_1 - no need for low / high sums anymore
582a39ff
ggerganov ggml : fix Q8_0 and Q8_1 rounding
66802448
ggerganov Revert "AVX implementations (#1370)"
bd5e3730
ggerganov ggml : fix AVX2 implementation
5bc286ab
ggerganov sha : update hashes for 7B and 13B
e038e01e
ggerganov ggerganov force pushed to e038e01e 3 years ago
ggerganov readme : update timings + remove warning banner
51c25fd9
ggerganov llama : update v2 PR number to 1405
1c87847b
ggerganov ggml : fix WASM comments
832c53f4
sw
sw commented on 2023-05-11
sw
ggerganov
ggerganov ggml : back to original bit order
ca7f069f
ggerganov ggerganov changed the title ggml : remove bit shuffling (non-breaking) ggml : remove bit shuffling 3 years ago
ggerganov ggerganov added performance
ggerganov ggerganov added breaking change
ggerganov readme : add note that Q4 and Q5 have been changed
b58b1f4b
ggerganov llama : fix return for unknown version
cbb6a3a7
ggerganov ggerganov merged b9fd7eee into master 3 years ago
ggerganov ggerganov deleted the remove-vzip-2 branch 3 years ago
Excigma
Excigma commented on 2023-05-11
redthing1
sw
ggerganov
M00N-MAN
LostRuins
philpax
LostRuins
TheBloke
BobbyKay1
BobbyKay1 approved these changes on 2023-12-10

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone