Clblast fixes + enhancements to save VRAM and offload more layers #1675
Use events instead of clFinish, where possible
ebc5d065
OpenCL: Don't load gpu layers into RAM, add mul_f32 kernel
97c5cca4
Reduce queueing overhead for contiguous tensors by using single mul k…
ac6b49ed
Merge remote-tracking branch 'origin/master' into opencl-dev
49aaf083
Adapt to #1612 cl_mem malloc changes
5e1eecfe
Reduce code duplication between cuda and opencl branches
457aaf5b
Improve implementation
24239f0d
Clblast fixes + enhancements to save VRAM:
59fe1687
Merge branch 'master' into concedo-opencl-dev
2b700749
change max value size_t to use limits
64e3e745
0cc4m
commented
on 2023-06-04
removed flags from the CL pool malloc, apply code tidying suggestions.
f6431ded
0cc4m
requested changes
on 2023-06-06
Update ggml-opencl.cpp
b6dd367b
0cc4m
approved these changes
on 2023-06-06
0cc4m
merged
d5b111f5
into master 2 years ago
LostRuins
deleted the concedo-opencl-dev branch 2 years ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub