cuda: optimize SOLVE_TRI using registers and FMAF #17703
ggml-cuda: optimize solve_tri_f32_fast and fix stride handling
9e398502
Merge branch 'ggml-org:master' into TRI_SOLVE
4c9c6348
Small cleanup
b14882fc
Remove comments in solve_tri.cu
68881efb
Merge branch 'ggml-org:master' into TRI_SOLVE
a29836b7
Merge branch 'ggml-org:master' into TRI_SOLVE
642e898a
Update ggml/src/ggml-cuda/solve_tri.cu
c55b5bf9
Update ggml/src/ggml-cuda/solve_tri.cu
2fd92648
Update ggml/src/ggml-cuda/solve_tri.cu
b27ce89a
Merge branch 'ggml-org:master' into TRI_SOLVE
12d108ab
Use const for variables in solve_tri.cu
ec9b6f97
Replace fmaf with more readable code
a34a45a3
remove last fmaf
4a637096
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub