llama : add pipeline parallelism support #6017
llama : add pipeline parallelism support for batch processing with mu…
822121fb
server : add -ub, --ubatch-size parameter
1ac668e4
fix server embedding test
4ddccc28
compilade
dismissed these changes
on 2024-03-12
llama : fix Mamba inference for pipeline parallelism
937966d7
llama : limit max batch size to n_batch
00a415d1
add LLAMA_SCHED_MAX_COPIES to configure the number of input copies fo…
89bfa1f2
slaren
force pushed
to
89bfa1f2
1 year ago
fix hip build
aa1e2f8b
Merge remote-tracking branch 'origin/master' into sl/pipeline-paralle…
deb3e245
fix sycl build (disable cpy_tensor_async)
ead5c8b8
fix hip build
255c1ec1
compilade
dismissed their stale review
1 year ago
llama : limit n_batch and n_ubatch to n_ctx during context creation
44001533
llama : fix norm backend
9e7cecc1
batched-bench : sync after decode
b25a0f19
swiftui : sync after decode
529e749e
ggml : allow ggml_get_rows to use multiple threads if they are available
54cdd478
check n_ubatch >= n_tokens with non-casual attention
cda49d38
slaren
force pushed
to
cda49d38
1 year ago
llama : do not limit n_batch to n_ctx with non-casual attn
015e1bfe
server : construct batch with size of llama_n_batch
0d934ee5
ggerganov
approved these changes
on 2024-03-13
slaren
commented
on 2024-03-13
ggml_backend_cpu_graph_compute : fix return value when alloc fails
3c38789f
llama : better n_batch and n_ubatch comment
9092883d
fix merge
cb580a64
small fix
1f564815
reduce default n_batch to 2048
976176d0
slaren
merged
f30ea47a
into master 1 year ago
slaren
deleted the sl/pipeline-parallelism branch 1 year ago
Assignees
No one assigned