llama.cpp
ggml : remove bit shuffling
#1405
Merged

Commits
  • ggml : remove Q4_0 bit shufling (ARM NEON)
    ggerganov committed 3 years ago
  • ggml : remove Q4_1 bit shuffling (ARM NEON + reference)
    ggerganov committed 3 years ago
  • ggml : nibbles_from_floats() + bytes_from_nibbles() (ARM NEON)
    ggerganov committed 3 years ago
  • ggml : remove Q4_2 bit shuffling (WIP, BROKEN)
    ggerganov committed 3 years ago
  • ggml : remove Q5_0 bit shuffling (ARM NEON)
    ggerganov committed 3 years ago
  • ggml : 2x faster scalar implementations
    ggerganov committed 3 years ago
  • ggml : remove Q5_1 bit shuffling (ARM NEON + scalar)
    ggerganov committed 3 years ago
  • ggml : simplify scalar dot
    ggerganov committed 3 years ago
  • ggml : remove WASM SIMD bit shuffling + remove vzip for ARM 32-bit
    ggerganov committed 3 years ago
  • ggml : fix Q4_1 quantization
    ggerganov committed 3 years ago
  • ggml : update cuBLAS + normalize variable names
    ggerganov committed 3 years ago
  • ggml : remove Q4_2 mode
    ggerganov committed 3 years ago
  • ggml : minor formatting
    ggerganov committed 3 years ago
  • ggml : fix Q5_0 quantization
    ggerganov committed 3 years ago
  • scripts : add script for measuring the time per token
    ggerganov committed 3 years ago
  • AVX implementations (#1370)
    ggerganov committed 3 years ago
  • ggml : uniform 5th bit extraction
    ggerganov committed 3 years ago
  • llama : produce error upon loading old model files
    ggerganov committed 3 years ago
  • llama : fix model magic/version write
    ggerganov committed 3 years ago
  • ggml : speed-up Q5_0 + Q5_1 at 4 threads
    ggerganov committed 3 years ago
  • ggml : preserve old Q4 and Q5 formats
    ggerganov committed 3 years ago
  • ggml : simplify Q8_1 - no need for low / high sums anymore
    ggerganov committed 3 years ago
  • ggml : fix Q8_0 and Q8_1 rounding
    ggerganov committed 3 years ago
  • Revert "AVX implementations (#1370)"
    ggerganov committed 3 years ago
  • ggml : fix AVX2 implementation
    ggerganov committed 3 years ago
  • sha : update hashes for 7B and 13B
    ggerganov committed 3 years ago
  • readme : update timings + remove warning banner
    ggerganov committed 3 years ago
  • llama : update v2 PR number to 1405
    ggerganov committed 3 years ago
  • ggml : fix WASM comments
    ggerganov committed 3 years ago
  • ggml : back to original bit order
    ggerganov committed 3 years ago
  • readme : add note that Q4 and Q5 have been changed
    ggerganov committed 3 years ago
  • llama : fix return for unknown version
    ggerganov committed 3 years ago
Loading