llama.cpp
470939d4 - common : preallocate sampling token data vector (#8363)

Commit

1 year ago

common : preallocate sampling token data vector (#8363) `emplace_back` repeatedly-called is slower than preallocating the vector to the vocab size and directly inserting the data. Some rudimentary profiling with `chrono` improves the performance of this block of code from ~500us/op to ~40us/op. Overall, this slightly improves the sampling performance which has a more substantial impact for the `examples/lookahead` implementation -- I am able to see a ~10% performance boost in lookahead inference.

References

#8363 - common : preallocate sampling token data vector

Author

kevmo314

Parents

6f0dbf6a

llama.cpp 470939d4 - common : preallocate sampling token data vector (#8363)

llama.cpp
470939d4 - common : preallocate sampling token data vector (#8363)