llama.cpp
1.5 bit quantization
#5453
Merged

Commits
  • iq1_s: WIP basics
    Iwan Kawrakow committed 1 year ago
  • iq1_s: CUDA is working
    Iwan Kawrakow committed 1 year ago
  • iq1_s: scalar CPU dot product
    Iwan Kawrakow committed 1 year ago
  • iq1_s: WIP AVX2 dot product - something is not right
    Iwan Kawrakow committed 1 year ago
  • Fix tests
    Iwan Kawrakow committed 1 year ago
  • Fix shadow warnings
    Iwan Kawrakow committed 1 year ago
  • Fix after merge with latest master
    Iwan Kawrakow committed 1 year ago
  • iq1_s: AVX2 finally works
    Iwan Kawrakow committed 1 year ago
  • iq1_s: ARM_NEON dot product. Works, but not very fast
    Iwan Kawrakow committed 1 year ago
  • iq1_s: better grid
    Iwan Kawrakow committed 1 year ago
  • iq1_s: use IQ2_XXS for attn_output
    Iwan Kawrakow committed 1 year ago
  • iq1_s: Metal basics
    Iwan Kawrakow committed 1 year ago
  • iq1_s: Metal works, but quite slow
    Iwan Kawrakow committed 1 year ago
  • iq1_s: Tests
    Iwan Kawrakow committed 1 year ago
  • iq1_s: slightly faster dot product
    Iwan Kawrakow committed 1 year ago
Loading