llama.cpp
0f630fbc - cuda : ROCm AMD Unified Memory Architecture (UMA) handling (#4449)

Commit

1 year ago

cuda : ROCm AMD Unified Memory Architecture (UMA) handling (#4449) * AMD ROCm: handle UMA memory VRAM expansions This resolves #2797 by allowing ROCm AMD GPU users with a UMA to dynamically expand the VRAM allocated to the GPU. Without this, AMD ROCm users with shared CPU/GPU memory usually are stuck with the BIOS-set (or fixed) framebuffer VRAM, making it impossible to load more than 1-2 layers. Note that the model is duplicated in RAM because it's loaded once for the CPU and then copied into a second set of allocations that are managed by the HIP UMA system. We can fix this later. * clarify build process for ROCm on linux with cmake * avoid using deprecated ROCm hipMallocHost * keep simplifying the change required for UMA * cmake: enable UMA-compatible allocation when LLAMA_HIP_UMA=ON

References

#4449 - ROCm AMD Unified Memory Architecture (UMA) handling

Author

ekg

Parents

562cf222

llama.cpp 0f630fbc - cuda : ROCm AMD Unified Memory Architecture (UMA) handling (#4449)

llama.cpp
0f630fbc - cuda : ROCm AMD Unified Memory Architecture (UMA) handling (#4449)