llama : greatly reduce output buffer memory usage #6122
llama : greatly reduce logits memory usage
1fd1918b
llama : more compact state saving and reloading
98914c0e
llama : fix lctx.n_outputs not being set before building graph
705d3937
perplexity : adapt to the logits API changes
25981fca
perplexity : fix Winogrande, use correct logits for second choice start
17b45c96
perplexity : normalize spaces and punctuation in Winogrande sentences
d0129e8e
llama : fix embedding conditions
487f89ec
llama : fix llama_get_embeddings_ith when the resulting id is 0
408fcb0f
llama : fix wrong n_outputs in llama_set_inputs
e19cb3ae
slaren
commented
on 2024-03-18
llama : fix not-skipping outputs of non-causal models
a57fa7fa
llama : fix running a batch with n_outputs == 0
711b0bcb
llama : keep same graph topology even when n_outputs == 0
d1005022
ggml : saner ggml_can_repeat with empty tensors
99c37ccb
ggml : do not multi-thread ops returning empty tensors
6bf7f3f4
slaren
commented
on 2024-03-18
slaren
commented
on 2024-03-18
ggml : make ggml_is_empty public and work with views
09bb15a6
llama : use a vector for ctx->output_ids
4551e7eb
ggml : skip empty tensors in all backends
8b826c5b
llama : fix llama_output_reserve nullptr deref when new_size is 0
d04cfaf2
slaren
commented
on 2024-03-19
perplexity : make Winogrande work as it does on master
8f70dcb0
llama : clearer error messages for invalid logits or embeddings ids
615a3a4a
slaren
commented
on 2024-03-20
llama : handle errors from llama_output_reserve at call sites
7d8d6b58
perplexity : make hellaswag and multiple-choice outputs identical to …
5f33a675
Merge branch 'master' into compilade/smaller-output-buffer
ffa9abd9
slaren
approved these changes
on 2024-03-26
llama : allow loading state saved with a different ctx size
e9095aca
llama : minor
5027d81f
ggerganov
approved these changes
on 2024-03-26
readme : update recent API changes, and warn about Vulkan
20248e80
compilade
force pushed
to
20248e80
2 years ago
ggerganov
merged
557410b8
into master 2 years ago
ggerganov
deleted the compilade/smaller-output-buffer branch 2 years ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub