llama.cpp
1fcdcc28 - cuda : performance optimizations (#1530)

Commit
2 years ago
cuda : performance optimizations (#1530) * xor hack * block y dim * loop unrolling * Fixed cmake LLAMA_CUDA_BY option * Removed hipblas compatibility code * Define GGML_CUDA_DMMV_BLOCK_Y if not defined * Fewer iters, more ops per iter * Renamed DMMV X/Y compilation options
Parents
Loading