PR #15813 CUDA: Conv2d Tensor Core

CUDA: cov2d with tensor core

mnehete32 committed 16 days ago

CUDA: conv2d added comment

mnehete32 committed 16 days ago

CUDA: conv2d support fp16 without wmma

mnehete32 committed 16 days ago

CUDA: conv2d using mma.cuh

mnehete32 committed 9 days ago

CUDA: conv2d convert int64_t to int

mnehete32 committed 9 days ago

CUDA: conv2d update block size

mnehete32 committed 8 days ago

Merge branch 'master' of https://github.com/mnehete32/llama.cpp into conv2d_tensor_core

mnehete32 committed 5 days ago

CUDA: conv2d performance optimization

mnehete32 committed 5 days ago

CUDA: conv2d minor fixes

mnehete32 committed 5 days ago

llama.cpp CUDA: Conv2d Tensor Core #15813 Closed