llama.cpp
517b7170
- cpu: introduce chunking for repack matmuls and enable matmul-id chunking on ARM64 (#16833)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
6 days ago
cpu: introduce chunking for repack matmuls and enable matmul-id chunking on ARM64 (#16833) Very similar implementation to the flash-attention chunking, with similar benefits.
References
#16833 - cpu: introduce chunking for repack matmuls and enable matmul-id chunking
Author
max-krasnyansky
Parents
835e918d
Loading