llama.cpp
metal: somewhat faster f16 x f32 matrix multiply kernel
#2951
Merged

metal: somewhat faster f16 x f32 matrix multiply kernel #2951

ikawrakow merged 2 commits into master from ik/metal_faster_mm_f16_f32
ikawrakow
Somewhat faster f16 x f32 matrix multiply kernel
af226bd2
ikawrakow ikawrakow requested a review from ggerganov ggerganov 2 years ago
monatis
ggerganov
ggerganov approved these changes on 2023-09-01
Better use 32 thread groups for f16 x f32
cad50d19
ggerganov
ikawrakow ikawrakow merged e8d91589 into master 2 years ago
ikawrakow ikawrakow deleted the ik/metal_faster_mm_f16_f32 branch 2 years ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone