llama.cpp
2bf8d0f7 - backend : offload large batches to GPU (#6083)

Commit

2 years ago

backend : offload large batches to GPU (#6083) * backend : offload large batches to GPU * fix hip * code cleanup * fix CUDA split buffers * Update ggml-backend-impl.h Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * cuda : fix memset without set_device * imatrix : remove sched affix from weight names * sched : add a new split if the current one has too many inputs reduce max inputs per split more cleanup * update backends ggml-ci --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

References

#6083 - backend : offload large batches to GPU

Author

slaren

Parents

496bc79b

llama.cpp 2bf8d0f7 - backend : offload large batches to GPU (#6083)

llama.cpp
2bf8d0f7 - backend : offload large batches to GPU (#6083)