llama.cpp
38dbdf4c - CUDA: Optimize PAD_REFLECT_1D (#15957)

Commit
3 days ago
CUDA: Optimize PAD_REFLECT_1D (#15957) * CUDA: Optimize PAD_REFLECT_1D feat: add more test cases for PAD_REFLECT_1D * use fast_div to improve performance * Apply suggestion from JohannesGaessler Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Apply suggestion from JohannesGaessler Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * optimize * use a concise expression to further speedup the cuda kernel --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
Author
Parents
Loading