llama.cpp
31914624 - vulkan: improve partial offloading performance on AMD (#19976)

Commit
2 days ago
vulkan: improve partial offloading performance on AMD (#19976) * vulkan: fix and enable cpy_tensor_async function * use transfer_queue for async transfers on AMD, synchronize with timeline semaphore * update offload_op logic * fix missing transfer submission * disable async transfer queue on AMD GCN * revert op batch size change * fix cpy_tensor_async checks
Author
Parents
Loading