llama.cpp
llama : add pipeline parallelism support
#6017
Merged

llama : add pipeline parallelism support #6017

slaren merged 23 commits into master from sl/pipeline-parallelism
slaren
slaren llama : add pipeline parallelism support for batch processing with mu…
822121fb
phymbert
slaren server : add -ub, --ubatch-size parameter
1ac668e4
ggerganov ggerganov added high priority
slaren fix server embedding test
4ddccc28
ggerganov
slaren
compilade
compilade dismissed these changes on 2024-03-12
slaren
compilade llama : fix Mamba inference for pipeline parallelism
937966d7
slaren llama : limit max batch size to n_batch
00a415d1
slaren add LLAMA_SCHED_MAX_COPIES to configure the number of input copies fo…
89bfa1f2
slaren slaren force pushed to 89bfa1f2 1 year ago
slaren fix hip build
aa1e2f8b
slaren Merge remote-tracking branch 'origin/master' into sl/pipeline-paralle…
deb3e245
slaren fix sycl build (disable cpy_tensor_async)
ead5c8b8
slaren fix hip build
255c1ec1
Engininja2
slaren
compilade compilade dismissed their stale review 1 year ago
It works properly with Mamba.
compilade
compilade commented on 2024-03-13
slaren llama : limit n_batch and n_ubatch to n_ctx during context creation
44001533
slaren llama : fix norm backend
9e7cecc1
ggerganov batched-bench : sync after decode
b25a0f19
ggerganov swiftui : sync after decode
529e749e
ggerganov
ggerganov commented on 2024-03-13
slaren ggml : allow ggml_get_rows to use multiple threads if they are available
54cdd478
slaren check n_ubatch >= n_tokens with non-casual attention
cda49d38
slaren slaren force pushed to cda49d38 1 year ago
slaren llama : do not limit n_batch to n_ctx with non-casual attn
015e1bfe
ggerganov server : construct batch with size of llama_n_batch
0d934ee5
ggerganov
ggerganov approved these changes on 2024-03-13
phymbert
slaren
phymbert
slaren
phymbert
slaren
slaren
slaren
slaren commented on 2024-03-13
slaren ggml_backend_cpu_graph_compute : fix return value when alloc fails
3c38789f
slaren llama : better n_batch and n_ubatch comment
9092883d
slaren fix merge
cb580a64
slaren small fix
1f564815
slaren reduce default n_batch to 2048
976176d0
ggerganov
slaren slaren merged f30ea47a into master 1 year ago
slaren slaren deleted the sl/pipeline-parallelism branch 1 year ago
siavashmohammady66

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone