OpenCL dequant_mul_mat #1459
0cc4m
marked this pull request as ready for review 2 years ago
Move back to C++ for OpenCL
a7e3bee4
Refactor OpenCL code to work more like the CUDA code, add missing fun…
17e53dbb
Fix bugs in dequant_mul_mat code
5f610c90
Fix dequant_mul_mat kernel
8c7a7cea
Add remaining dequant_mul_mat functions
cb588e2a
Fix CMakeLists.txt
19683803
Generate dequant_mul_mat kernels from simple templates
915d0d11
Fix error in convert f16 to f32 kernel call
cda2d488
Fix tensor load to device
42e1a2ba
Deduplicate dequant kernels
457eff92
Fix convert_row_f16 kernel issue
e41a7ae4
Add OpenCL compile options
a1657d02
Use compile args for preprocessing constants
b6b39960
0cc4m
force pushed
from
fb638fa8
to
b6b39960
2 years ago
Explicitely set GEMM type
18e9dd87
Only copy f16/f32 buffer if not already on GPU
4a559514
SlyEcho
requested changes
on 2023-05-21
change to fprintf
e1ee2810
SlyEcho
approved these changes
on 2023-05-22
Restore default platform + device selection by id behavior
4dfd4fe1
SlyEcho
requested changes
on 2023-05-22
Small compiler warning fixes
cb28080a
SlyEcho
approved these changes
on 2023-05-22
SlyEcho
merged
2e6cd4b0
into master 2 years ago
0cc4m
deleted the opencl-dev branch 2 years ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub