llama.cpp
llama : ggml-backend integration
#4766
Merged

llama : ggml-backend integration #4766

slaren merged 39 commits into master from sl/backend-sched
slaren
slaren slaren force pushed from 5a9f0712 to af8a3742 1 year ago
ggerganov ggerganov added high priority
ggerganov ggerganov added need feedback
JohannesGaessler
JohannesGaessler commented on 2024-01-04
JohannesGaessler
JohannesGaessler
slaren
slaren slaren force pushed from af8a3742 to e7129358 1 year ago
slaren
ggerganov
JohannesGaessler
slaren
slaren
ggerganov
slaren llama : ggml-backend integration
33f0761e
slaren ggml-backend : add names to buffers
6483328f
slaren fix unmap after loading
a1ab35c6
ggerganov batched-bench : add tensor_split param
1fa7ee2e
slaren llama : check for null tensor_split
863ef455
slaren ggml-backend : increase GGML_MAX_BACKENDS
d1074593
cmp-nct
slaren slaren force pushed from dcc0dada to 4d0d647a 1 year ago
slaren
slaren improve graph splitting, partial fix for --no-kv-offload
ece0b0d8
slaren slaren force pushed from 4d0d647a to ece0b0d8 1 year ago
cmp-nct
JohannesGaessler
slaren
cmp-nct
JohannesGaessler
JohannesGaessler
slaren cuda : add ggml-backend split buffer support
2f2c3679
slaren cuda : do not create buffer types for devices that don't exist (fixes…
72b74f36
ggerganov ggml : fix null backend dereference (#4807)
f77c72f3
slaren test-backend-ops : check buffer allocation failures
7c16cf10
slaren Merge remote-tracking branch 'origin/master' into sl/backend-sched
87c8207a
slaren
JohannesGaessler
slaren
slaren llama : add cparam (split_mode) and command line argument (--split-mo…
5e879c99
slaren
JohannesGaessler
slaren ggml : fix mul_mat_id work size
ac145fd2
slaren llama : rewrite session kv load/set without graphs
444b975e
slaren minor
d41cef93
slaren llama : only initialize used backends, free backends on context free
5a62db30
slaren llama : abort ctx if cuda backend init fails
4813e175
slaren llama : rewrite lora with ggml-backend and compute on CPU
11583c14
slaren slaren force pushed from d97c90c7 to 11583c14 1 year ago
slaren
slaren llama : only map to a backend buffer the region of the file mapping c…
4ed5f621
slaren opencl : add ggml-backend buffer type
fa762011
slaren slaren marked this pull request as ready for review 1 year ago
slaren
slaren Merge remote-tracking branch 'origin/master' into sl/backend-sched
2e7814a8
slaren slaren force pushed from 4850d04c to 2e7814a8 1 year ago
ggerganov ggerganov requested a review from ggerganov ggerganov 1 year ago
slaren
slaren cuda : only use batched_cublas with batched mat muls (fixes fp16 tg p…
5d2dffcf
slaren Merge remote-tracking branch 'origin/master' into sl/backend-sched
3cb1c1fb
slaren
slaren
sorasoras
ggerganov
ggerganov llama : on Metal, by default offload the full model
07a1b052
ggerganov
ReinForce-II
slaren
ggerganov
ggerganov
ggerganov metal : page align the data ptr (#4854)
3cd0cbb1
8XXD8
JohannesGaessler
JohannesGaessler commented on 2024-01-10
slaren Apply suggestions from code review
74066f8c
ggerganov
ggerganov approved these changes on 2024-01-10
JohannesGaessler
JohannesGaessler
slaren
cebtenzzre
cebtenzzre commented on 2024-01-10
JohannesGaessler
slaren
JohannesGaessler
slaren
JohannesGaessler
JohannesGaessler
slaren
JohannesGaessler
JohannesGaessler
slaren
slaren cuda : fix split buffer free
c522c112
JohannesGaessler
JohannesGaessler
slaren
slaren
JohannesGaessler
JohannesGaessler
slaren address review comments
9d4ba6ed
slaren llama-bench : add split-mode parameter
d83c0840
slaren fix whitespace
6dcc42bd
ggerganov
MaggotHATE
8XXD8
JohannesGaessler
JohannesGaessler
slaren opencl : fix double initialization
42aa835c
slaren Merge remote-tracking branch 'origin/master' into sl/backend-sched
c3681af7
slaren
JohannesGaessler
slaren
MaggotHATE
sorasoras
JohannesGaessler
slaren
JohannesGaessler
slaren server : add --split-mode parameter
c4867196
JohannesGaessler
ggerganov
slaren
ggerganov
JohannesGaessler
JohannesGaessler
slaren
ggerganov
slaren
ggerganov
slaren
ggerganov
sorasoras
calvintwr
slaren
slaren use async copy and compute to improve multi-gpu performance
23c14ef5
slaren slaren force pushed from ca0c6dd4 to 23c14ef5 1 year ago
slaren use async memcpys to copy the graph outputs to the CPU
e73009ea
slaren
slaren fix opencl
1e7694ee
JohannesGaessler
sorasoras
slaren
sorasoras
sorasoras
JohannesGaessler
sorasoras
slaren
sorasoras
slaren
sorasoras
slaren Merge remote-tracking branch 'origin/master' into sl/backend-sched
458674c0
slaren use a host buffer for the cpu compute buffer for faster copies to the…
53ae0dd8
slaren slaren force pushed from 30461434 to 53ae0dd8 1 year ago
slaren
ggerganov
slaren
ggerganov
slaren
ggerganov
slaren slaren merged e7e4df03 into master 1 year ago
ikawrakow
ggerganov
mononoSaya
Green-Sky
ggerganov
Green-Sky
ikawrakow
Ph0rk0z
JohannesGaessler
Ph0rk0z
sorasoras
jukofyork
JohannesGaessler
Ph0rk0z
Ph0rk0z
Ph0rk0z
slaren slaren deleted the sl/backend-sched branch 1 year ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone