llama.cpp
ggml : add ggml_soft_max_ext
#4256
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
14
Changes
View On
GitHub
Commits
metal : implement soft_max_ext
ggerganov
committed
2 years ago
cuda : implement soft_max_ext
ggerganov
committed
2 years ago
ggml : implement soft_max_ext (CPU)
ggerganov
committed
2 years ago
batched-bench : print threads
ggerganov
committed
2 years ago
metal : simplify soft_max encoding
ggerganov
committed
2 years ago
cuda : use 512 threads for soft_max instead of 32
ggerganov
committed
2 years ago
ggml : update soft max cpu
ggerganov
committed
2 years ago
cuda : do warp-based block reduce
ggerganov
committed
2 years ago
cuda : increase max block size to 1024
ggerganov
committed
2 years ago
cuda : fix warp reduction initialization of shared mem
ggerganov
committed
2 years ago
metal : warp-based reduction for soft max kernel
ggerganov
committed
2 years ago
metal : warp-based reduce for rms_norm
ggerganov
committed
2 years ago
metal : simplify soft max kernel
ggerganov
committed
2 years ago
alloc : fix build with debug
ggerganov
committed
2 years ago
Loading