llama.cpp
llama : greatly reduce output buffer memory usage
#6122
Merged

llama : greatly reduce output buffer memory usage #6122

ggerganov merged 26 commits into master from compilade/smaller-output-buffer
compilade
compilade llama : greatly reduce logits memory usage
1fd1918b
compilade llama : more compact state saving and reloading
98914c0e
compilade llama : fix lctx.n_outputs not being set before building graph
705d3937
compilade perplexity : adapt to the logits API changes
25981fca
compilade perplexity : fix Winogrande, use correct logits for second choice start
17b45c96
compilade perplexity : normalize spaces and punctuation in Winogrande sentences
d0129e8e
compilade llama : fix embedding conditions
487f89ec
compilade llama : fix llama_get_embeddings_ith when the resulting id is 0
408fcb0f
compilade llama : fix wrong n_outputs in llama_set_inputs
e19cb3ae
slaren
slaren commented on 2024-03-18
compilade llama : fix not-skipping outputs of non-causal models
a57fa7fa
slaren
compilade llama : fix running a batch with n_outputs == 0
711b0bcb
compilade
slaren
compilade llama : keep same graph topology even when n_outputs == 0
d1005022
compilade ggml : saner ggml_can_repeat with empty tensors
99c37ccb
compilade ggml : do not multi-thread ops returning empty tensors
6bf7f3f4
fgdfgfthgr-fox
Dampfinchen
slaren
slaren commented on 2024-03-18
slaren
slaren commented on 2024-03-18
compilade ggml : make ggml_is_empty public and work with views
09bb15a6
compilade llama : use a vector for ctx->output_ids
4551e7eb
compilade ggml : skip empty tensors in all backends
8b826c5b
compilade llama : fix llama_output_reserve nullptr deref when new_size is 0
d04cfaf2
compilade
slaren
slaren
Dampfinchen
slaren
slaren commented on 2024-03-19
compilade perplexity : make Winogrande work as it does on master
8f70dcb0
compilade llama : clearer error messages for invalid logits or embeddings ids
615a3a4a
slaren
slaren commented on 2024-03-20
compilade llama : handle errors from llama_output_reserve at call sites
7d8d6b58
compilade perplexity : make hellaswag and multiple-choice outputs identical to …
5f33a675
slaren
0cc4m
compilade Merge branch 'master' into compilade/smaller-output-buffer
ffa9abd9
slaren
slaren approved these changes on 2024-03-26
compilade llama : allow loading state saved with a different ctx size
e9095aca
0cc4m
ggerganov llama : minor
5027d81f
ggerganov
ggerganov approved these changes on 2024-03-26
compilade readme : update recent API changes, and warn about Vulkan
20248e80
compilade compilade force pushed to 20248e80 2 years ago
ggerganov ggerganov merged 557410b8 into master 2 years ago
ggerganov ggerganov deleted the compilade/smaller-output-buffer branch 2 years ago
ikawrakow
Wuzzooy
compilade

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone