llama.cpp
ggml : add ggml_soft_max_ext
#4256
Merged

Commits
  • metal : implement soft_max_ext
    ggerganov committed 2 years ago
  • cuda : implement soft_max_ext
    ggerganov committed 2 years ago
  • ggml : implement soft_max_ext (CPU)
    ggerganov committed 2 years ago
  • batched-bench : print threads
    ggerganov committed 2 years ago
  • metal : simplify soft_max encoding
    ggerganov committed 2 years ago
  • cuda : use 512 threads for soft_max instead of 32
    ggerganov committed 2 years ago
  • ggml : update soft max cpu
    ggerganov committed 2 years ago
  • cuda : do warp-based block reduce
    ggerganov committed 2 years ago
  • cuda : increase max block size to 1024
    ggerganov committed 2 years ago
  • cuda : fix warp reduction initialization of shared mem
    ggerganov committed 2 years ago
  • metal : warp-based reduction for soft max kernel
    ggerganov committed 2 years ago
  • metal : warp-based reduce for rms_norm
    ggerganov committed 2 years ago
  • metal : simplify soft max kernel
    ggerganov committed 2 years ago
  • alloc : fix build with debug
    ggerganov committed 2 years ago
Loading