PR #17457 SOLVE_TRI CUDA kernel for small matrices

SOLVE_TRI CUDA kernel for small matrices #17457

am17an merged 17 commits into ggml-org:master from pwilkin:solve_tri_cuda

SOLVE_TRI CUDA kernel for small matrices

0e6fd866

pwilkin requested a review from

slaren 167 days ago

github-actions added testing

github-actions added Nvidia GPU

github-actions added ggml

am17an commented on 2025-11-24

Changes from review

48369633

optimize

42d6d582

Merge pull request #2 from am17an/solve_tri_cuda_opt

084d650d

am17an commented on 2025-11-24

Remove unrolls

002d26e2

am17an commented on 2025-11-24

Refactor using `if constexpr`

e21a0f8e

Change to switch

b2d870ef

am17an approved these changes on 2025-11-24

clang-format

376d4beb

am17an requested a review from

JohannesGaessler 166 days ago

Add guards

4e8524c5

JohannesGaessler commented on 2025-11-24

Add fixes from code review

baa58137

Remove unneeded division by zero guard

c5cd33ad

pwilkin requested a review from

ggerganov 164 days ago

But not like this...

6b11712c

Move second sync outside of loop

f19cdf8c

Move to column-based.

3a24c92b

Cleanup

6bf2328e

Correct clang-format

18fb1380

am17an approved these changes on 2025-11-27

Minor

ea4dc88a

am17an merged cd0e3a7a into master 163 days ago

JohannesGaessler commented on 2025-11-27

Reviewers

am17an

JohannesGaessler

darkbasic

slaren

ggerganov

Assignees

No one assigned

Labels

testing Nvidia GPU ggml

Milestone

No milestone

llama.cpp SOLVE_TRI CUDA kernel for small matrices #17457 Merged

SOLVE_TRI CUDA kernel for small matrices #17457

llama.cpp
SOLVE_TRI CUDA kernel for small matrices
#17457

Merged