llama.cpp
ggml : remove bit shuffling
#1405
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
32
Changes
View On
GitHub
Commits
ggml : remove Q4_0 bit shufling (ARM NEON)
ggerganov
committed
3 years ago
ggml : remove Q4_1 bit shuffling (ARM NEON + reference)
ggerganov
committed
3 years ago
ggml : nibbles_from_floats() + bytes_from_nibbles() (ARM NEON)
ggerganov
committed
3 years ago
ggml : remove Q4_2 bit shuffling (WIP, BROKEN)
ggerganov
committed
3 years ago
ggml : remove Q5_0 bit shuffling (ARM NEON)
ggerganov
committed
3 years ago
ggml : 2x faster scalar implementations
ggerganov
committed
3 years ago
ggml : remove Q5_1 bit shuffling (ARM NEON + scalar)
ggerganov
committed
3 years ago
ggml : simplify scalar dot
ggerganov
committed
3 years ago
ggml : remove WASM SIMD bit shuffling + remove vzip for ARM 32-bit
ggerganov
committed
3 years ago
ggml : fix Q4_1 quantization
ggerganov
committed
3 years ago
ggml : update cuBLAS + normalize variable names
ggerganov
committed
3 years ago
ggml : remove Q4_2 mode
ggerganov
committed
3 years ago
ggml : minor formatting
ggerganov
committed
3 years ago
ggml : fix Q5_0 quantization
ggerganov
committed
3 years ago
scripts : add script for measuring the time per token
ggerganov
committed
3 years ago
AVX implementations (#1370)
ggerganov
committed
3 years ago
ggml : uniform 5th bit extraction
ggerganov
committed
3 years ago
llama : produce error upon loading old model files
ggerganov
committed
3 years ago
llama : fix model magic/version write
ggerganov
committed
3 years ago
ggml : speed-up Q5_0 + Q5_1 at 4 threads
ggerganov
committed
3 years ago
ggml : preserve old Q4 and Q5 formats
ggerganov
committed
3 years ago
ggml : simplify Q8_1 - no need for low / high sums anymore
ggerganov
committed
3 years ago
ggml : fix Q8_0 and Q8_1 rounding
ggerganov
committed
3 years ago
Revert "AVX implementations (#1370)"
ggerganov
committed
3 years ago
ggml : fix AVX2 implementation
ggerganov
committed
3 years ago
sha : update hashes for 7B and 13B
ggerganov
committed
3 years ago
readme : update timings + remove warning banner
ggerganov
committed
3 years ago
llama : update v2 PR number to 1405
ggerganov
committed
3 years ago
ggml : fix WASM comments
ggerganov
committed
3 years ago
ggml : back to original bit order
ggerganov
committed
3 years ago
readme : add note that Q4 and Q5 have been changed
ggerganov
committed
3 years ago
llama : fix return for unknown version
ggerganov
committed
3 years ago
Loading