PR #6017 llama : add pipeline parallelism support

llama : add pipeline parallelism support #6017

slaren merged 23 commits into master from sl/pipeline-parallelism

llama : add pipeline parallelism support for batch processing with mu…

822121fb

server : add -ub, --ubatch-size parameter

1ac668e4

ggerganov added high priority

fix server embedding test

4ddccc28

compilade dismissed these changes on 2024-03-12

llama : fix Mamba inference for pipeline parallelism

937966d7

llama : limit max batch size to n_batch

00a415d1

add LLAMA_SCHED_MAX_COPIES to configure the number of input copies fo…

89bfa1f2

slaren force pushed to 89bfa1f2 1 year ago

fix hip build

aa1e2f8b

Merge remote-tracking branch 'origin/master' into sl/pipeline-paralle…

deb3e245

fix sycl build (disable cpy_tensor_async)

ead5c8b8

fix hip build

255c1ec1

compilade dismissed their stale review 1 year ago

It works properly with Mamba.

compilade commented on 2024-03-13

llama : limit n_batch and n_ubatch to n_ctx during context creation

44001533

llama : fix norm backend

9e7cecc1

batched-bench : sync after decode

b25a0f19

swiftui : sync after decode

529e749e

ggerganov commented on 2024-03-13

ggml : allow ggml_get_rows to use multiple threads if they are available

54cdd478

check n_ubatch >= n_tokens with non-casual attention

cda49d38

slaren force pushed to cda49d38 1 year ago

llama : do not limit n_batch to n_ctx with non-casual attn

015e1bfe

server : construct batch with size of llama_n_batch

0d934ee5

ggerganov approved these changes on 2024-03-13

slaren commented on 2024-03-13

ggml_backend_cpu_graph_compute : fix return value when alloc fails

3c38789f

llama : better n_batch and n_ubatch comment

9092883d

fix merge

cb580a64

small fix

1f564815

reduce default n_batch to 2048

976176d0

slaren merged f30ea47a into master 1 year ago

slaren deleted the sl/pipeline-parallelism branch 1 year ago

Reviewers

ggerganov

compilade

Assignees

No one assigned

Labels

high priority

Milestone

No milestone

llama.cpp llama : add pipeline parallelism support #6017 Merged

llama : add pipeline parallelism support #6017

llama.cpp
llama : add pipeline parallelism support
#6017

Merged