llama.cpp
517b7170 - cpu: introduce chunking for repack matmuls and enable matmul-id chunking on ARM64 (#16833)

Commit
6 days ago
cpu: introduce chunking for repack matmuls and enable matmul-id chunking on ARM64 (#16833) Very similar implementation to the flash-attention chunking, with similar benefits.
Parents
Loading