llama.cpp
Add support for CUMSUM and TRI for CUDA.
#17584
Merged

Add support for CUMSUM and TRI for CUDA. #17584

pwilkin merged 25 commits into ggml-org:master from pwilkin:tri_cumsum_cuda
pwilkin
pwilkin Add support for CUMSUM and TRI for CUDA.
d138a03d
pwilkin pwilkin requested a review from ggerganov ggerganov 21 days ago
github-actions github-actions added testing
github-actions github-actions added Nvidia GPU
github-actions github-actions added ggml
pwilkin Minor optimizations.
67207d21
pwilkin pwilkin requested a review from am17an am17an 21 days ago
pwilkin pwilkin requested a review from JohannesGaessler JohannesGaessler 21 days ago
pwilkin Correct warp_prefix_inclusive_sum in float2 variant to return float2
fab00294
am17an
wsbagnsv1
JohannesGaessler
JohannesGaessler commented on 2025-11-29
JohannesGaessler
pwilkin Optimize TRI
51c40a5a
pwilkin Whitespace
c30f5654
pwilkin
pwilkin Fix strides.
31b55fab
pwilkin Implement double loop
d1ca1c25
pwilkin Whitespace
5289b530
pwilkin
pwilkin Fix HIP compilation bugs
f422ba8e
pwilkin
gabe-l-hart
gabe-l-hart
gabe-l-hart commented on 2025-12-01
am17an
pwilkin
pwilkin Optimizations + big case performance tests
df917ccf
pwilkin Implement using CUB with fallback to custom kernel
76382d79
pwilkin
pwilkin Remove error message.
01d4033e
am17an
pwilkin
am17an
am17an commented on 2025-12-03
pwilkin Fixes from code review
10a2ea9d
pwilkin
pwilkin
pwilkin Comment out CPU-unsupported F16/BF16 cases to fix CI
7a83b056
pwilkin
CISC
pwilkin Fine, you win :P
bbe37435
pwilkin
CISC
CISC commented on 2025-12-04
am17an
am17an commented on 2025-12-03
pwilkin Fix last cast, use NO_DEVICE_CODE and GGML_UNUSED_VARS
069413ab
IMbackK
pwilkin Vary warp-size based on physical warp size
5aa7438e
pwilkin
pwilkin Add GGML_UNUSED_VARS in tri as well
579eba6e
IMbackK
JohannesGaessler
JohannesGaessler commented on 2025-12-04
pwilkin Use constexpr and call prefix_inclusive with warp_size template param
08b3f2d2
pwilkin Update ggml/src/ggml-cuda/cumsum.cu
9cd0eff1
pwilkin Apply suggestions from code review
9574264c
pwilkin Change to tid % warp_size
efd619a6
IMbackK
IMbackK requested changes on 2025-12-04
IMbackK
IMbackK requested changes on 2025-12-04
pwilkin Fix strides; hardcode mask; add ggml_lane_mask_t
86a0853f
pwilkin
JohannesGaessler
JohannesGaessler commented on 2025-12-04
pwilkin Missing renames, remove unused get_warp_mask(), explicit calls to ggm…
de45c632
JohannesGaessler
JohannesGaessler commented on 2025-12-04
pwilkin Too hasty...
8a7375c8
IMbackK
IMbackK approved these changes on 2025-12-04
JohannesGaessler
JohannesGaessler approved these changes on 2025-12-04
pwilkin
pwilkin
pwilkin pwilkin merged 96fe9bad into master 15 days ago
jacekpoplawski

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone