llama.cpp
metal: somewhat faster f16 x f32 matrix multiply kernel
#2951
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
2
Changes
View On
GitHub
metal: somewhat faster f16 x f32 matrix multiply kernel
#2951
ikawrakow
merged 2 commits into
master
from
ik/metal_faster_mm_f16_f32
Somewhat faster f16 x f32 matrix multiply kernel
af226bd2
ikawrakow
requested a review
from
ggerganov
2 years ago
ggerganov
approved these changes on 2023-09-01
Better use 32 thread groups for f16 x f32
cad50d19
ikawrakow
merged
e8d91589
into master
2 years ago
ikawrakow
deleted the ik/metal_faster_mm_f16_f32 branch
2 years ago
Login to write a write a comment.
Login via GitHub
Reviewers
ggerganov
Assignees
No one assigned
Labels
None yet
Milestone
No milestone
Login to write a write a comment.
Login via GitHub