llama.cpp
4db56412 - opencl: add kernel to handle mat mul in attention to improve encoding speed (#17181)

Commit

71 days ago

opencl: add kernel to handle mat mul in attention to improve encoding speed (#17181) * Add mul_mm_f16_f32_kq_kqv kernel * Add ggml_cl_mul_mat_kq_kqv_adreno func * fix whitespace * remove unused variable * remove redundant * refactor and clean up * remove trailing whitespace

References

#17181 - opencl: add kernel to handle mat mul in attention to improve encoding speed

Author

shaofeiqi

Parents

72bd7321

llama.cpp 4db56412 - opencl: add kernel to handle mat mul in attention to improve encoding speed (#17181)

llama.cpp
4db56412 - opencl: add kernel to handle mat mul in attention to improve encoding speed (#17181)