llama.cpp
986b6ce9 - ggml, llama : avoid heavy V transpose + improvements (#775)

Commit

2 years ago

ggml, llama : avoid heavy V transpose + improvements (#775) ggml : - added ggml_view_3d() - ggml_view_tensor() now inherits the stride too - reimplement ggml_cpy() to account for dst stride - no longer require tensor->data to be memory aligned llama : - compute RoPE on 32-bit tensors (should be more accurate) - store RoPE-ed K in the KV cache - store transposed V in the KV cache (significant speed-up) - avoid unnecessary Q copy

References

#775 - Avoid heavy V transpose operation + improvements

Author

ggerganov

Parents

34162989

llama.cpp 986b6ce9 - ggml, llama : avoid heavy V transpose + improvements (#775)

llama.cpp
986b6ce9 - ggml, llama : avoid heavy V transpose + improvements (#775)