llama.cpp
517b7170 - cpu: introduce chunking for repack matmuls and enable matmul-id chunking on ARM64 (#16833)

Commit

6 days ago

cpu: introduce chunking for repack matmuls and enable matmul-id chunking on ARM64 (#16833) Very similar implementation to the flash-attention chunking, with similar benefits.

References

#16833 - cpu: introduce chunking for repack matmuls and enable matmul-id chunking

Author

max-krasnyansky

Parents

835e918d

llama.cpp 517b7170 - cpu: introduce chunking for repack matmuls and enable matmul-id chunking on ARM64 (#16833)

llama.cpp
517b7170 - cpu: introduce chunking for repack matmuls and enable matmul-id chunking on ARM64 (#16833)