llama : rotate activations for better quantization #21038
ggerganov
force pushed
from
7711b3a3
to
5e60035f
75 days ago
ggerganov
marked this pull request as ready for review 75 days ago
CISC
approved these changes
on 2026-03-28
ggerganov
force pushed
from
5e60035f
to
e05a5045
74 days ago
ggerganov
force pushed
from
e05a5045
to
c35f75d0
72 days ago
pwilkin
approved these changes
on 2026-03-31
llama : rotate activations for better quantization
4d68f97e
cont : rotate V more + refactor
cb6e21d8
cont : rotate caches separately + support non-power-of-2 head sizes
d467d3d0
cont : simplify
898a8fe6
cont : add reference for V rotation
eaefe0f0
cont : refactor
66b72b41
cont : support context shift
62adb6a1
cont : consolidate
d424a885
cont : dedup + allow different types for the rotation matrix
69e476d3
cont : add env variable to disable rotation
29f41968
cont : simplify attn rot kv cache logic + rename env
8df24c07
cont : pre-compute the Hadamard matrices
a0c4a2a7
ggerganov
force pushed
from
c35f75d0
to
a0c4a2a7
72 days ago
ggerganov
merged
744c0c73
into master 71 days ago
ggerganov
deleted the gg/attn-rot branch 71 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub