PR #13291 cuda: refactored ssm_scan and use CUB

cuda: refactored ssm_scan to use CUB

Your-Cheese committed 331 days ago

fixed compilation error when when not using CUB

Your-Cheese committed 331 days ago

assign L to constant and use size_t instead of int

Your-Cheese committed 324 days ago

deduplicated functions

Your-Cheese committed 324 days ago

change min blocks per mp to 1

Your-Cheese committed 324 days ago

Use cub load and store warp transpose

Your-Cheese committed 324 days ago

Merge https://github.com/ggml-org/llama.cpp into ssm_scan_cub

Your-Cheese committed 239 days ago

suppress clang warning

Your-Cheese committed 235 days ago

llama.cpp cuda: refactored ssm_scan and use CUB #13291 Merged