llama.cpp
744c0c73 - llama : rotate activations for better quantization (#21038)

Commit
52 days ago
llama : rotate activations for better quantization (#21038) * llama : rotate activations for better quantization * cont : rotate V more + refactor * cont : rotate caches separately + support non-power-of-2 head sizes * cont : simplify * cont : add reference for V rotation * cont : refactor * cont : support context shift * cont : consolidate * cont : dedup + allow different types for the rotation matrix * cont : add env variable to disable rotation * cont : simplify attn rot kv cache logic + rename env * cont : pre-compute the Hadamard matrices
Author
Parents
Loading