SOLVE_TRI CUDA kernel for small matrices #17457
SOLVE_TRI CUDA kernel for small matrices
0e6fd866
am17an
commented
on 2025-11-24
Changes from review
48369633
optimize
42d6d582
Merge pull request #2 from am17an/solve_tri_cuda_opt
084d650d
am17an
commented
on 2025-11-24
Remove unrolls
002d26e2
am17an
commented
on 2025-11-24
Refactor using `if constexpr`
e21a0f8e
Change to switch
b2d870ef
am17an
approved these changes
on 2025-11-24
clang-format
376d4beb
Add guards
4e8524c5
Add fixes from code review
baa58137
Remove unneeded division by zero guard
c5cd33ad
But not like this...
6b11712c
Move second sync outside of loop
f19cdf8c
Move to column-based.
3a24c92b
Cleanup
6bf2328e
Correct clang-format
18fb1380
am17an
approved these changes
on 2025-11-27
Minor
ea4dc88a
am17an
merged
cd0e3a7a
into master 34 days ago
Assignees
No one assigned
Labels
testing
Nvidia GPU
ggml
Login to write a write a comment.
Login via GitHub