llama.cpp
744c0c73 - llama : rotate activations for better quantization (#21038)

Commit

52 days ago

llama : rotate activations for better quantization (#21038) * llama : rotate activations for better quantization * cont : rotate V more + refactor * cont : rotate caches separately + support non-power-of-2 head sizes * cont : simplify * cont : add reference for V rotation * cont : refactor * cont : support context shift * cont : consolidate * cont : dedup + allow different types for the rotation matrix * cont : add env variable to disable rotation * cont : simplify attn rot kv cache logic + rename env * cont : pre-compute the Hadamard matrices

References

#21038 - llama : rotate activations for better quantization

Author

ggerganov

Parents

0356e33a

llama.cpp 744c0c73 - llama : rotate activations for better quantization (#21038)

llama.cpp
744c0c73 - llama : rotate activations for better quantization (#21038)