PR #4256 ggml : add ggml_soft_max_ext

ggml : add ggml_soft_max_ext #4256

ggerganov merged 14 commits into master from gg/soft-max-ext

metal : implement soft_max_ext

e89597c0

cuda : implement soft_max_ext

88519fbf

ggml : implement soft_max_ext (CPU)

6a66f69f

batched-bench : print threads

390a4459

ggerganov force pushed to 390a4459 2 years ago

metal : simplify soft_max encoding

580fe206

ggerganov commented on 2023-11-30

cuda : use 512 threads for soft_max instead of 32

ebd062bc

ggerganov force pushed to ebd062bc 2 years ago

ggerganov commented on 2023-11-30

ggml : update soft max cpu

c7c8dabc

ggerganov commented on 2023-11-30

cuda : do warp-based block reduce

62532c05

cuda : increase max block size to 1024

6b86bcff

cuda : fix warp reduction initialization of shared mem

68e02c0d

metal : warp-based reduction for soft max kernel

55717c98

metal : warp-based reduce for rms_norm

c4db5923

metal : simplify soft max kernel

d9c8fa3b

alloc : fix build with debug

eb594c0f

ggerganov merged ef47ec18 into master 2 years ago

Reviewers

slaren

Assignees

No one assigned

Labels

None yet

Milestone

No milestone