opencl: enable the general fp mm for non-cont input and as a fallback for specialized kqv kernel for adreno #18970
opencl: add `copy_to_contiguous` and utilize mm kernels
6d0a567b
opencl: only copy to cont for f32 and f16 tensors
b773905f
opencl: use cont mm for fallback when dst is large
861c9815
opencl: use nb local to copy-to-cont
ca8a5064
opencl: use local offset as well
f04a782c
lhez
marked this pull request as ready for review 7 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub