llama.cpp
SOLVE_TRI CUDA kernel for small matrices
#17457
Merged

SOLVE_TRI CUDA kernel for small matrices #17457

am17an merged 17 commits into ggml-org:master from pwilkin:solve_tri_cuda
pwilkin
pwilkin SOLVE_TRI CUDA kernel for small matrices
0e6fd866
pwilkin pwilkin requested a review from slaren slaren 39 days ago
github-actions github-actions added testing
github-actions github-actions added Nvidia GPU
github-actions github-actions added ggml
wsbagnsv1
am17an
am17an commented on 2025-11-24
pwilkin Changes from review
48369633
pwilkin
am17an optimize
42d6d582
am17an
am17an
pwilkin Merge pull request #2 from am17an/solve_tri_cuda_opt
084d650d
am17an
am17an commented on 2025-11-24
pwilkin Remove unrolls
002d26e2
am17an
am17an commented on 2025-11-24
pwilkin Refactor using `if constexpr`
e21a0f8e
pwilkin Change to switch
b2d870ef
pwilkin
am17an
am17an approved these changes on 2025-11-24
pwilkin clang-format
376d4beb
am17an am17an requested a review from JohannesGaessler JohannesGaessler 38 days ago
theo77186
pwilkin
pwilkin Add guards
4e8524c5
pwilkin
JohannesGaessler
JohannesGaessler commented on 2025-11-24
JohannesGaessler
pwilkin Add fixes from code review
baa58137
pwilkin
jeffbolznv
pwilkin
pwilkin Remove unneeded division by zero guard
c5cd33ad
pwilkin pwilkin requested a review from ggerganov ggerganov 36 days ago
pwilkin
pwilkin But not like this...
6b11712c
pwilkin Move second sync outside of loop
f19cdf8c
pwilkin
wsbagnsv1
pwilkin
pwilkin
pwilkin Move to column-based.
3a24c92b
pwilkin
pwilkin
am17an
pwilkin Cleanup
6bf2328e
pwilkin
pwilkin Correct clang-format
18fb1380
am17an
am17an approved these changes on 2025-11-27
wsbagnsv
wsbagnsv
pwilkin Minor
ea4dc88a
jeffbolznv
theo77186
pwilkin
wsbagnsv1
pwilkin
am17an am17an merged cd0e3a7a into master 35 days ago
JohannesGaessler
JohannesGaessler commented on 2025-11-27
JohannesGaessler
wsbagnsv1
pwilkin
wsbagnsv1
wsbagnsv1
JohannesGaessler
wsbagnsv1
JohannesGaessler
wsbagnsv1

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone