llama.cpp
cuda : add batched cuBLAS GEMM for faster attention
#3749
Merged

cuda : add batched cuBLAS GEMM for faster attention #3749

ggerganov merged 10 commits into master from cuda-batched-gemm
ggerganov
ggerganov cmake : add helper for faster CUDA builds
8fb1be64
ggerganov batched : add NGL arg
6a30bf3e
ggerganov ggml : skip nops in compute_forward
8d8d54f8
ggerganov cuda : minor indentation
84d4ca0e
ggerganov cuda : batched cuBLAS GEMMs for src0 F16 and src1 F32 (attention ops)
c13fcfbf
ggerganov ggerganov added performance
ggerganov ggerganov added high priority
ggerganov ggerganov added need feedback
ggerganov ggerganov added Nvidia GPU
slaren
KerfuffleV2
KerfuffleV2 commented on 2023-10-23
KerfuffleV2 Apply suggestions from code review
878aa4f2
KerfuffleV2
ggerganov cuda : add ROCm / hipBLAS cublasGemmBatchedEx define
d4156690
KerfuffleV2
ggerganov
KerfuffleV2
ggerganov cuda : add cublasGemmStridedBatchedEx for non-broadcasted cases
3d297c1a
ggerganov ggerganov force pushed from 25a0b908 to 3d297c1a 1 year ago
ggerganov
ggerganov
ggerganov commented on 2023-10-24
ggerganov cuda : reduce mallocs in cublasGemmBatchedEx branch
27c34c01
ggerganov cuda : add TODO for calling cublas from kernel + using mem pool
d798a17c
ggerganov
ggerganov ggerganov merged 2b4ea35e into master 1 year ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone