llama.cpp
ggml : support broadcast for ggml_soft_max_ext and ggml_flash_attn_ext
#14435
Merged

ggml : support broadcast for ggml_soft_max_ext and ggml_flash_attn_ext #14435

ggerganov merged 4 commits into master from gg/ggml-batch-soft-max-ops
ggerganov
github-actions github-actions added testing
github-actions github-actions added ggml
github-actions github-actions added Apple Metal
ggerganov ggerganov force pushed from e6faa451 to 236682a7 130 days ago
github-actions github-actions added Nvidia GPU
github-actions github-actions added Vulkan
github-actions github-actions added Ascend NPU
ggerganov ggerganov force pushed from 236682a7 to 572a062e 130 days ago
ggerganov ggerganov force pushed from 572a062e to 852529e9 130 days ago
ggerganov ggerganov force pushed from 852529e9 to bdfd7b75 130 days ago
ggerganov ggerganov marked this pull request as ready for review 130 days ago
github-actions github-actions added SYCL
ggerganov ggerganov force pushed from bdfd7b75 to 461cb2f3 129 days ago
jeffbolznv
jeffbolznv
JohannesGaessler JohannesGaessler requested a review from JohannesGaessler JohannesGaessler 126 days ago
ggerganov ggml : support bcast ggml_soft_max_ext, ggml_flash_attn_ext (#14435)
32366701
jeffbolznv vulkan: support softmax/FA batch and broadcast (#14449)
b7265648
JohannesGaessler CUDA: broadcasting for FlashAttention mask (#14500)
3045a1eb
am17an CUDA: add softmax broadcast (#14475)
3b38afdf
ggerganov ggerganov force pushed from be8d4700 to 3b38afdf 126 days ago
ggerganov ggerganov merged 55a1c5a5 into master 126 days ago
ggerganov ggerganov deleted the gg/ggml-batch-soft-max-ops branch 126 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone