llama.cpp
1f5accb8 - Fix garbled output with REPACK at high thread counts (#16956)

Commit

82 days ago

Fix garbled output with REPACK at high thread counts (#16956) * Fix garbled output with REPACK at high thread counts Fixed a race condition in the REPACK matrix multiplication code that caused garbled output when using 26+ threads (model-dependent threshold). The issue occurred because with high thread counts, the code forced chunk count to equal thread count, creating many small chunks. After aligning these chunks to NB_COLS boundaries, adjacent chunks could overlap, causing data corruption and race conditions. The fix enforces minimum chunk sizes based on NB_COLS and caps maximum chunk count to prevent creating too many tiny chunks, ensuring proper alignment without overlaps. * Update ggml/src/ggml-cpu/repack.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml/src/ggml-cpu/repack.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

References

#16956 - Fix garbled output with REPACK at high thread counts

Author

NoahOksuz

Parents

2759ccdb

llama.cpp 1f5accb8 - Fix garbled output with REPACK at high thread counts (#16956)

llama.cpp
1f5accb8 - Fix garbled output with REPACK at high thread counts (#16956)