llama.cpp
dcb2ed48 - OpenCL: Fix duplication of layers in VRAM and RAM, add GPU mul kernel (#1653)

Commit

2 years ago

OpenCL: Fix duplication of layers in VRAM and RAM, add GPU mul kernel (#1653) * Use events instead of clFinish, where possible * OpenCL: Don't load gpu layers into RAM, add mul_f32 kernel * Reduce queueing overhead for contiguous tensors by using single mul kernel call * Adapt to #1612 cl_mem malloc changes * Reduce code duplication between cuda and opencl branches * Improve implementation

References

#1653 - OpenCL: Fix duplication of layers in VRAM and RAM, add GPU mul kernel

Author

0cc4m

Parents

d8bd0013

llama.cpp dcb2ed48 - OpenCL: Fix duplication of layers in VRAM and RAM, add GPU mul kernel (#1653)

llama.cpp
dcb2ed48 - OpenCL: Fix duplication of layers in VRAM and RAM, add GPU mul kernel (#1653)