llama.cpp
Add support for CUMSUM and TRI for CUDA.
#17584
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
25
Changes
View On
GitHub
Commits
Add support for CUMSUM and TRI for CUDA.
pwilkin
committed
26 days ago
Minor optimizations.
pwilkin
committed
26 days ago
Correct warp_prefix_inclusive_sum in float2 variant to return float2
pwilkin
committed
26 days ago
Optimize TRI
pwilkin
committed
24 days ago
Whitespace
pwilkin
committed
24 days ago
Fix strides.
pwilkin
committed
24 days ago
Implement double loop
pwilkin
committed
23 days ago
Whitespace
pwilkin
committed
23 days ago
Fix HIP compilation bugs
pwilkin
committed
23 days ago
Optimizations + big case performance tests
pwilkin
committed
23 days ago
Implement using CUB with fallback to custom kernel
pwilkin
committed
23 days ago
Remove error message.
pwilkin
committed
23 days ago
Fixes from code review
pwilkin
committed
22 days ago
Comment out CPU-unsupported F16/BF16 cases to fix CI
pwilkin
committed
21 days ago
Fine, you win :P
pwilkin
committed
21 days ago
Fix last cast, use NO_DEVICE_CODE and GGML_UNUSED_VARS
pwilkin
committed
21 days ago
Vary warp-size based on physical warp size
pwilkin
committed
21 days ago
Add GGML_UNUSED_VARS in tri as well
pwilkin
committed
21 days ago
Use constexpr and call prefix_inclusive with warp_size template param
pwilkin
committed
21 days ago
Update ggml/src/ggml-cuda/cumsum.cu
pwilkin
committed
21 days ago
Apply suggestions from code review
pwilkin
committed
21 days ago
Change to tid % warp_size
pwilkin
committed
21 days ago
Fix strides; hardcode mask; add ggml_lane_mask_t
pwilkin
committed
21 days ago
Missing renames, remove unused get_warp_mask(), explicit calls to ggml_cuda_info()
pwilkin
committed
21 days ago
Too hasty...
pwilkin
committed
21 days ago
Loading