llama.cpp
CUDA performance optimization: asynchronous computation by using only one cudaStream
#1898

Merged

CUDA performance optimization: asynchronous computation by using only one cudaStream #1898

JohannesGaessler merged 1 commit into ggml-org:master from JohannesGaessler:cuda-async-compute

ggerganov requested a review from

slaren 3 years ago

slaren approved these changes on 2023-06-16

Only one CUDA stream per device for async compute

8a93a05a

JohannesGaessler force pushed from 4e85b43d to 8a93a05a 3 years ago

ggerganov approved these changes on 2023-06-17

JohannesGaessler merged 2c9380dd into master 3 years ago

Reviewers

ggerganov

slaren

Assignees

No one assigned

Labels

None yet

Milestone

No milestone