llama.cpp
Add support for CUMSUM and TRI for CUDA.
#17584
Merged

Commits
  • Add support for CUMSUM and TRI for CUDA.
    pwilkin committed 26 days ago
  • Minor optimizations.
    pwilkin committed 26 days ago
  • Correct warp_prefix_inclusive_sum in float2 variant to return float2
    pwilkin committed 26 days ago
  • Optimize TRI
    pwilkin committed 24 days ago
  • Whitespace
    pwilkin committed 24 days ago
  • Fix strides.
    pwilkin committed 24 days ago
  • Implement double loop
    pwilkin committed 23 days ago
  • Whitespace
    pwilkin committed 23 days ago
  • Fix HIP compilation bugs
    pwilkin committed 23 days ago
  • Optimizations + big case performance tests
    pwilkin committed 23 days ago
  • Implement using CUB with fallback to custom kernel
    pwilkin committed 23 days ago
  • Remove error message.
    pwilkin committed 23 days ago
  • Fixes from code review
    pwilkin committed 22 days ago
  • Comment out CPU-unsupported F16/BF16 cases to fix CI
    pwilkin committed 21 days ago
  • Fine, you win :P
    pwilkin committed 21 days ago
  • Fix last cast, use NO_DEVICE_CODE and GGML_UNUSED_VARS
    pwilkin committed 21 days ago
  • Vary warp-size based on physical warp size
    pwilkin committed 21 days ago
  • Add GGML_UNUSED_VARS in tri as well
    pwilkin committed 21 days ago
  • Use constexpr and call prefix_inclusive with warp_size template param
    pwilkin committed 21 days ago
  • Update ggml/src/ggml-cuda/cumsum.cu
    pwilkin committed 21 days ago
  • Apply suggestions from code review
    pwilkin committed 21 days ago
  • Change to tid % warp_size
    pwilkin committed 21 days ago
  • Fix strides; hardcode mask; add ggml_lane_mask_t
    pwilkin committed 21 days ago
  • Missing renames, remove unused get_warp_mask(), explicit calls to ggml_cuda_info()
    pwilkin committed 21 days ago
  • Too hasty...
    pwilkin committed 21 days ago
Loading