llama.cpp
backend : offload large batches to GPU
#6083
Merged

backend : offload large batches to GPU #6083

slaren merged 9 commits into master from sl/sched-auto-offload
slaren
slaren
Artefact2
Dampfinchen
USBhost
slaren
tbocek
slaren
USBhost
Dampfinchen
slaren
Dampfinchen
slaren
fgdfgfthgr-fox
MaggotHATE
slaren
MaggotHATE
8XXD8
slaren backend : offload large batches to GPU
5b6b4ac2
slaren fix hip
c2dba045
slaren code cleanup
3a774427
slaren fix CUDA split buffers
c0fe6298
slaren slaren force pushed from dc93f5ac to c0fe6298 1 year ago
slaren
Artefact2
JohannesGaessler
JohannesGaessler commented on 2024-03-16
slaren Update ggml-backend-impl.h
8e717e8c
slaren
slaren cuda : fix memset without set_device
9cba8a18
slaren imatrix : remove sched affix from weight names
98090759
slaren sched : add a new split if the current one has too many inputs
0661e6a1
slaren slaren marked this pull request as ready for review 1 year ago
slaren slaren force pushed 1 year ago
slaren
ggerganov
slaren update backends
cc9299ce
slaren slaren force pushed to cc9299ce 1 year ago
ggerganov
ggerganov approved these changes on 2024-03-18
ggerganov ggerganov added high priority
slaren slaren merged 2bf8d0f7 into master 1 year ago
slaren slaren deleted the sl/sched-auto-offload branch 1 year ago
slaren
JohannesGaessler
slaren
JohannesGaessler
slaren
JohannesGaessler
slaren

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone