PR #7061 CUDA: generalize FP16 fattn vec kernel

CUDA: generalize FP16 fattn vec kernel #7061

JohannesGaessler merged 7 commits into ggml-org:master from JohannesGaessler:cuda-fa-no-tc-5

JohannesGaessler force pushed to 57bde8c2 1 year ago

CUDA: generalize FP16 fattn vec kernel

48463c0b

disable unsupported head sizes for AMD in test

86636bd1

try AMD fix

617f129e

fix batch size 2-8

d9bcb92f

partially revert changes

fa81c3a2

mofosyne added enhancement

fix performance regression

22727651

fix compiler warning

fece1fe4

JohannesGaessler force pushed from 78ee06e5 to fece1fe4 1 year ago

mofosyne added Review Complexity : High

slaren approved these changes on 2024-05-09

JohannesGaessler merged a743d76a into master 1 year ago

Reviewers

slaren

Assignees

No one assigned

Labels

enhancement Review Complexity : High

Milestone

No milestone