PR #4766 llama : ggml-backend integration

llama : ggml-backend integration #4766

slaren merged 39 commits into master from sl/backend-sched

slaren force pushed from 5a9f0712 to af8a3742 1 year ago

ggerganov added high priority

ggerganov added need feedback

JohannesGaessler commented on 2024-01-04

slaren force pushed from af8a3742 to e7129358 1 year ago

llama : ggml-backend integration

33f0761e

ggml-backend : add names to buffers

6483328f

fix unmap after loading

a1ab35c6

batched-bench : add tensor_split param

1fa7ee2e

llama : check for null tensor_split

863ef455

ggml-backend : increase GGML_MAX_BACKENDS

d1074593

slaren force pushed from dcc0dada to 4d0d647a 1 year ago

improve graph splitting, partial fix for --no-kv-offload

ece0b0d8

slaren force pushed from 4d0d647a to ece0b0d8 1 year ago

cuda : add ggml-backend split buffer support

2f2c3679

cuda : do not create buffer types for devices that don't exist (fixes…

72b74f36

ggml : fix null backend dereference (#4807)

f77c72f3

test-backend-ops : check buffer allocation failures

7c16cf10

Merge remote-tracking branch 'origin/master' into sl/backend-sched

87c8207a

llama : add cparam (split_mode) and command line argument (--split-mo…

5e879c99

ggml : fix mul_mat_id work size

ac145fd2

llama : rewrite session kv load/set without graphs

444b975e

minor

d41cef93

llama : only initialize used backends, free backends on context free

5a62db30

llama : abort ctx if cuda backend init fails

4813e175

llama : rewrite lora with ggml-backend and compute on CPU

11583c14

slaren force pushed from d97c90c7 to 11583c14 1 year ago

llama : only map to a backend buffer the region of the file mapping c…

4ed5f621

opencl : add ggml-backend buffer type

fa762011

slaren marked this pull request as ready for review 1 year ago

Merge remote-tracking branch 'origin/master' into sl/backend-sched

2e7814a8

slaren force pushed from 4850d04c to 2e7814a8 1 year ago

ggerganov requested a review from

ggerganov 1 year ago

cuda : only use batched_cublas with batched mat muls (fixes fp16 tg p…

5d2dffcf

Merge remote-tracking branch 'origin/master' into sl/backend-sched

3cb1c1fb

llama : on Metal, by default offload the full model

07a1b052

metal : page align the data ptr (#4854)

3cd0cbb1

JohannesGaessler commented on 2024-01-10

Apply suggestions from code review

74066f8c

ggerganov approved these changes on 2024-01-10

cebtenzzre commented on 2024-01-10

cuda : fix split buffer free

c522c112

address review comments

9d4ba6ed

llama-bench : add split-mode parameter

d83c0840

fix whitespace

6dcc42bd

opencl : fix double initialization

42aa835c

Merge remote-tracking branch 'origin/master' into sl/backend-sched

c3681af7

server : add --split-mode parameter

c4867196

use async copy and compute to improve multi-gpu performance

23c14ef5

slaren force pushed from ca0c6dd4 to 23c14ef5 1 year ago

use async memcpys to copy the graph outputs to the CPU

e73009ea

fix opencl

1e7694ee

Merge remote-tracking branch 'origin/master' into sl/backend-sched

458674c0

use a host buffer for the cpu compute buffer for faster copies to the…

53ae0dd8

slaren force pushed from 30461434 to 53ae0dd8 1 year ago

slaren merged e7e4df03 into master 1 year ago

slaren deleted the sl/backend-sched branch 1 year ago

Reviewers

ggerganov

JohannesGaessler

cebtenzzre

bobqianic

Assignees

No one assigned

Labels

high priority need feedback

Milestone

No milestone

llama.cpp llama : ggml-backend integration #4766 Merged

llama : ggml-backend integration #4766

llama.cpp
llama : ggml-backend integration
#4766

Merged