llama.cpp
986b6ce9 - ggml, llama : avoid heavy V transpose + improvements (#775)

Commit
2 years ago
ggml, llama : avoid heavy V transpose + improvements (#775) ggml : - added ggml_view_3d() - ggml_view_tensor() now inherits the stride too - reimplement ggml_cpy() to account for dst stride - no longer require tensor->data to be memory aligned llama : - compute RoPE on 32-bit tensors (should be more accurate) - store RoPE-ed K in the KV cache - store transposed V in the KV cache (significant speed-up) - avoid unnecessary Q copy
Author
Parents
Loading