llama.cpp
cuda : add batched cuBLAS GEMM for faster attention
#3749
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
10
Changes
View On
GitHub
cuda : add batched cuBLAS GEMM for faster attention
#3749
ggerganov
merged 10 commits into
master
from
cuda-batched-gemm
cmake : add helper for faster CUDA builds
8fb1be64
batched : add NGL arg
6a30bf3e
ggml : skip nops in compute_forward
8d8d54f8
cuda : minor indentation
84d4ca0e
cuda : batched cuBLAS GEMMs for src0 F16 and src1 F32 (attention ops)
c13fcfbf
ggerganov
added
performance
ggerganov
added
high priority
ggerganov
added
need feedback
ggerganov
added
Nvidia GPU
KerfuffleV2
commented on 2023-10-23
Apply suggestions from code review
878aa4f2
cuda : add ROCm / hipBLAS cublasGemmBatchedEx define
d4156690
cuda : add cublasGemmStridedBatchedEx for non-broadcasted cases
3d297c1a
ggerganov
force pushed
from
25a0b908
to
3d297c1a
1 year ago
ggerganov
commented on 2023-10-24
cuda : reduce mallocs in cublasGemmBatchedEx branch
27c34c01
cuda : add TODO for calling cublas from kernel + using mem pool
d798a17c
ggerganov
merged
2b4ea35e
into master
1 year ago
Login to write a write a comment.
Login via GitHub
Reviewers
slaren
KerfuffleV2
Assignees
No one assigned
Labels
performance
high priority
need feedback
Nvidia GPU
Milestone
No milestone
Login to write a write a comment.
Login via GitHub