ggml : add ggml_soft_max_ext #4256
metal : implement soft_max_ext
e89597c0
cuda : implement soft_max_ext
88519fbf
ggml : implement soft_max_ext (CPU)
6a66f69f
batched-bench : print threads
390a4459
ggerganov
force pushed
to
390a4459
2 years ago
metal : simplify soft_max encoding
580fe206
cuda : use 512 threads for soft_max instead of 32
ebd062bc
ggerganov
force pushed
to
ebd062bc
2 years ago
ggml : update soft max cpu
c7c8dabc
cuda : do warp-based block reduce
62532c05
cuda : increase max block size to 1024
6b86bcff
cuda : fix warp reduction initialization of shared mem
68e02c0d
metal : warp-based reduction for soft max kernel
55717c98
metal : warp-based reduce for rms_norm
c4db5923
metal : simplify soft max kernel
d9c8fa3b
alloc : fix build with debug
eb594c0f
ggerganov
merged
ef47ec18
into master 2 years ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub