llama.cpp
e8d91589 - metal: somewhat faster f16 x f32 matrix multiply kernel (#2951)

Commit
2 years ago
metal: somewhat faster f16 x f32 matrix multiply kernel (#2951) * Somewhat faster f16 x f32 matrix multiply kernel * Better use 32 thread groups for f16 x f32 --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Author
Parents
Loading