llama.cpp
metal: somewhat faster f16 x f32 matrix multiply kernel
#2951
Merged

Commits
  • Somewhat faster f16 x f32 matrix multiply kernel
    Iwan Kawrakow committed 2 years ago
  • Better use 32 thread groups for f16 x f32
    Iwan Kawrakow committed 2 years ago
Loading