llama.cpp
llama : ggml-backend integration
#4766
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
39
Changes
View On
GitHub
llama : ggml-backend integration
#4766
slaren
merged 39 commits into
master
from
sl/backend-sched
slaren
force pushed
from
5a9f0712
to
af8a3742
1 year ago
ggerganov
added
high priority
ggerganov
added
need feedback
JohannesGaessler
commented on 2024-01-04
slaren
force pushed
from
af8a3742
to
e7129358
1 year ago
llama : ggml-backend integration
33f0761e
ggml-backend : add names to buffers
6483328f
fix unmap after loading
a1ab35c6
batched-bench : add tensor_split param
1fa7ee2e
llama : check for null tensor_split
863ef455
ggml-backend : increase GGML_MAX_BACKENDS
d1074593
slaren
force pushed
from
dcc0dada
to
4d0d647a
1 year ago
improve graph splitting, partial fix for --no-kv-offload
ece0b0d8
slaren
force pushed
from
4d0d647a
to
ece0b0d8
1 year ago
cuda : add ggml-backend split buffer support
2f2c3679
cuda : do not create buffer types for devices that don't exist (fixes…
72b74f36
ggml : fix null backend dereference (#4807)
f77c72f3
test-backend-ops : check buffer allocation failures
7c16cf10
Merge remote-tracking branch 'origin/master' into sl/backend-sched
87c8207a
llama : add cparam (split_mode) and command line argument (--split-mo…
5e879c99
ggml : fix mul_mat_id work size
ac145fd2
llama : rewrite session kv load/set without graphs
444b975e
minor
d41cef93
llama : only initialize used backends, free backends on context free
5a62db30
llama : abort ctx if cuda backend init fails
4813e175
llama : rewrite lora with ggml-backend and compute on CPU
11583c14
slaren
force pushed
from
d97c90c7
to
11583c14
1 year ago
llama : only map to a backend buffer the region of the file mapping c…
4ed5f621
opencl : add ggml-backend buffer type
fa762011
slaren
marked this pull request as ready for review
1 year ago
Merge remote-tracking branch 'origin/master' into sl/backend-sched
2e7814a8
slaren
force pushed
from
4850d04c
to
2e7814a8
1 year ago
ggerganov
requested a review
from
ggerganov
1 year ago
cuda : only use batched_cublas with batched mat muls (fixes fp16 tg p…
5d2dffcf
Merge remote-tracking branch 'origin/master' into sl/backend-sched
3cb1c1fb
llama : on Metal, by default offload the full model
07a1b052
metal : page align the data ptr (#4854)
3cd0cbb1
JohannesGaessler
commented on 2024-01-10
Apply suggestions from code review
74066f8c
ggerganov
approved these changes on 2024-01-10
cebtenzzre
commented on 2024-01-10
cuda : fix split buffer free
c522c112
address review comments
9d4ba6ed
llama-bench : add split-mode parameter
d83c0840
fix whitespace
6dcc42bd
opencl : fix double initialization
42aa835c
Merge remote-tracking branch 'origin/master' into sl/backend-sched
c3681af7
server : add --split-mode parameter
c4867196
use async copy and compute to improve multi-gpu performance
23c14ef5
slaren
force pushed
from
ca0c6dd4
to
23c14ef5
1 year ago
use async memcpys to copy the graph outputs to the CPU
e73009ea
fix opencl
1e7694ee
Merge remote-tracking branch 'origin/master' into sl/backend-sched
458674c0
use a host buffer for the cpu compute buffer for faster copies to the…
53ae0dd8
slaren
force pushed
from
30461434
to
53ae0dd8
1 year ago
slaren
merged
e7e4df03
into master
1 year ago
slaren
deleted the sl/backend-sched branch
1 year ago
Login to write a write a comment.
Login via GitHub
Reviewers
ggerganov
JohannesGaessler
cebtenzzre
bobqianic
Assignees
No one assigned
Labels
high priority
need feedback
Milestone
No milestone
Login to write a write a comment.
Login via GitHub