llama.cpp
e8d91589
- metal: somewhat faster f16 x f32 matrix multiply kernel (#2951)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
2 years ago
metal: somewhat faster f16 x f32 matrix multiply kernel (#2951) * Somewhat faster f16 x f32 matrix multiply kernel * Better use 32 thread groups for f16 x f32 --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
References
#2951 - metal: somewhat faster f16 x f32 matrix multiply kernel
Author
ikawrakow
Parents
bce1fef3
Loading