llama.cpp
134e6940 - llama : skip output reordering for single token batches (#17466)

Commit

64 days ago

llama : skip output reordering for single token batches (#17466) This commit adds a check to skip the output reordering logic when n_outputs == 1. With a single output token, the data is trivially sorted and the reordering code is currently doing unnecessary work (resetting and rebuilding output_ids to the same values). The motivation for this change is improved code clarity and avoiding confusion when debugging. While the performance impact is probably negligible, this unnecessary work happens on every decode call in llama-server when processing batches with single-token outputs.

References

#17466 - llama : skip output reordering for single token batches

Author

danbev

Parents

0543f928

llama.cpp 134e6940 - llama : skip output reordering for single token batches (#17466)

llama.cpp
134e6940 - llama : skip output reordering for single token batches (#17466)