llama.cpp
CUDA performance optimization: asynchronous computation by using only one cudaStream
#1898
Merged

CUDA performance optimization: asynchronous computation by using only one cudaStream #1898

JohannesGaessler
ggerganov ggerganov requested a review from slaren slaren 2 years ago
slaren
slaren approved these changes on 2023-06-16
shouyiwang
shouyiwang
JohannesGaessler
shouyiwang
JohannesGaessler Only one CUDA stream per device for async compute
8a93a05a
JohannesGaessler JohannesGaessler force pushed from 4e85b43d to 8a93a05a 2 years ago
ggerganov
ggerganov approved these changes on 2023-06-17
JohannesGaessler JohannesGaessler merged 2c9380dd into master 2 years ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone