llama.cpp
ggml : add ggml_soft_max_ext
#4256
Merged

ggml : add ggml_soft_max_ext #4256

ggerganov merged 14 commits into master from gg/soft-max-ext
ggerganov
ggerganov metal : implement soft_max_ext
e89597c0
ggerganov cuda : implement soft_max_ext
88519fbf
ggerganov ggml : implement soft_max_ext (CPU)
6a66f69f
ggerganov batched-bench : print threads
390a4459
ggerganov ggerganov force pushed to 390a4459 2 years ago
ggerganov metal : simplify soft_max encoding
580fe206
slaren
ggerganov
ggerganov
ggerganov commented on 2023-11-30
ggerganov cuda : use 512 threads for soft_max instead of 32
ebd062bc
ggerganov ggerganov force pushed to ebd062bc 2 years ago
ggerganov
ggerganov commented on 2023-11-30
ggerganov ggml : update soft max cpu
c7c8dabc
ggerganov
ggerganov commented on 2023-11-30
ggerganov cuda : do warp-based block reduce
62532c05
ggerganov cuda : increase max block size to 1024
6b86bcff
slaren
ggerganov cuda : fix warp reduction initialization of shared mem
68e02c0d
ggerganov
ggerganov metal : warp-based reduction for soft max kernel
55717c98
slaren
ggerganov metal : warp-based reduce for rms_norm
c4db5923
ggerganov metal : simplify soft max kernel
d9c8fa3b
ggerganov alloc : fix build with debug
eb594c0f
ggerganov ggerganov merged ef47ec18 into master 2 years ago
LostRuins

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone