llama.cpp
64ac9ab6 - CUDA : Fix CUB's argsort when nrows % block_size == 0 CCCL < 3.1 (#21181)

Commit
3 days ago
CUDA : Fix CUB's argsort when nrows % block_size == 0 CCCL < 3.1 (#21181) * CUDA: Fix CUB's argsort when nrows % block_size == 0 CCCL < 3.1 We wrongly calculated offset_grid as `ceildiv(nrows, block_size)`, while it must be `ceildiv(nrows + 1, block_size)`. As a consequence, we had uninitialized values in `offset_iterator[nrows]` for the case when `nrows % block_size == 0`. Fixes #21162 * Reduce nrows in test case to 256, don't need 768
Author
Parents
Loading