llama.cpp
c13fcfbf - cuda : batched cuBLAS GEMMs for src0 F16 and src1 F32 (attention ops)

Commit

1 year ago

cuda : batched cuBLAS GEMMs for src0 F16 and src1 F32 (attention ops)

References

#3749 - cuda : add batched cuBLAS GEMM for faster attention

Author

ggerganov

ggerganov

Parents

Files1

ggml-cuda.cu

Loading