llama.cpp
66199c9f - ggml : use a simple std::thread in AMX without OpenMP (#20074)

Commit
72 days ago
ggml : use a simple std::thread in AMX without OpenMP (#20074) Disabling OpenMP generally provides better inference performance (at least in my testing) but the loading becomes slightly slower. Benchmark results for `convert_B_packed_format()`: Before this commit: N K | No OpenMP OpenMP | Diff | Speedup ------------------------------------------------------------ 512 2880 | 640.9us 263.5us | -58.9% | 0.41x 2880 4096 | 2.55ms 261.7us | -89.8% | 0.10x 201088 2880 | 256.44ms 21.61ms | -91.6% | 0.08x ------------------------------------------------------------ Total: 325.43ms vs 31.05ms After: N K | No OpenMP OpenMP | Diff | Speedup ------------------------------------------------------------ 512 2880 | 1.49ms 263.5us | -82.3% | 0.18x 2880 4096 | 1.55ms 261.7us | -83.1% | 0.17x 201088 2880 | 24.03ms 21.61ms | -10.1% | 0.90x ------------------------------------------------------------ Total: 78.97ms vs 31.05ms Tested with unsloth/gpt-oss-20b-GGUF:Q4_K_M. Signed-off-by: Adrien Gallouët <angt@huggingface.co>
Author
Parents
Loading