@0cc4m This PR has two additional changes:
ggml_backend_vk_get_device_description
(I believe this was a bug)Vulkan<idx>
. This is the intended use for the name of these objects, a more detailed description can now be obtained using the ggml-backend device interface.After this change it is possible to use Vulkan and CUDA in the same llama.cpp build (you may have the disable the NVIDIA devices in the Vulkan backend using the GGML_VK_VISIBLE_DEVICES
environment variable).
Seems to work fine (Win10), but I'm noticing another increase in layer size. Previously with Mistral-Nemo-Instruct-2407.q5_k_l
I could offload 5 layers on 3GB VRAM, now it's only 3. Is it expected? The total VRAM usage is pretty much the same as before backend registry updates.
I don't think there are any changes here that could increase the memory usage. It's just exposing existing functionality of the vulkan backend through a different interface.
@slaren Thank you for implementing this. I can confirm it builds on Linux and that the code looks good. I can't fully test it currently since my server is still disassembled cause I'm in the process of moving between cities. I should be able to reassemble it this weekend, but I'm still very busy. You can decide if you prefer to wait or if you think it's ready to merge.
Can you check the changes to ggml_backend_vk_get_device_description
? Previously, it wouldn't translate the device index to the indexes given by GGML_VK_VISIBLE_DEVICES
, which I believe was a bug. Other than that, I think that there is very little chance that this PR breaks anything.
That was a bug, yeah.
Login to write a write a comment.