llama.cpp
cuda: optimize SOLVE_TRI using registers and FMAF
#17703
Merged

cuda: optimize SOLVE_TRI using registers and FMAF #17703

wsbagnsv1
wsbagnsv1 ggml-cuda: optimize solve_tri_f32_fast and fix stride handling
9e398502
wsbagnsv1 Merge branch 'ggml-org:master' into TRI_SOLVE
4c9c6348
wsbagnsv1 Small cleanup
b14882fc
wsbagnsv1 Remove comments in solve_tri.cu
68881efb
wsbagnsv1 Merge branch 'ggml-org:master' into TRI_SOLVE
a29836b7
wsbagnsv1 Merge branch 'ggml-org:master' into TRI_SOLVE
642e898a
CISC CISC requested a review from JohannesGaessler JohannesGaessler 107 days ago
github-actions github-actions added Nvidia GPU
github-actions github-actions added ggml
JohannesGaessler
JohannesGaessler commented on 2025-12-04
wsbagnsv1 Update ggml/src/ggml-cuda/solve_tri.cu
c55b5bf9
wsbagnsv1 Update ggml/src/ggml-cuda/solve_tri.cu
2fd92648
wsbagnsv1 Update ggml/src/ggml-cuda/solve_tri.cu
b27ce89a
wsbagnsv1 Merge branch 'ggml-org:master' into TRI_SOLVE
12d108ab
wsbagnsv1 Use const for variables in solve_tri.cu
ec9b6f97
wsbagnsv1 Replace fmaf with more readable code
a34a45a3
JohannesGaessler
JohannesGaessler commented on 2025-12-04
wsbagnsv1 remove last fmaf
4a637096
JohannesGaessler
JohannesGaessler approved these changes on 2025-12-08
JohannesGaessler JohannesGaessler merged 5814b4dc into master 101 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone