llama.cpp
ggml : support broadcast for ggml_soft_max_ext and ggml_flash_attn_ext
#14435

Merged

ggml : support broadcast for ggml_soft_max_ext and ggml_flash_attn_ext #14435

ggerganov merged 4 commits into master from gg/ggml-batch-soft-max-ops

github-actions added testing

github-actions added ggml

github-actions added Apple Metal

ggerganov force pushed from e6faa451 to 236682a7 130 days ago

github-actions added Nvidia GPU

github-actions added Vulkan

github-actions added Ascend NPU

ggerganov force pushed from 236682a7 to 572a062e 130 days ago

ggerganov force pushed from 572a062e to 852529e9 130 days ago

ggerganov force pushed from 852529e9 to bdfd7b75 130 days ago

ggerganov marked this pull request as ready for review 130 days ago

github-actions added SYCL

ggerganov force pushed from bdfd7b75 to 461cb2f3 129 days ago

JohannesGaessler requested a review from

JohannesGaessler 126 days ago

ggml : support bcast ggml_soft_max_ext, ggml_flash_attn_ext (#14435)

32366701

vulkan: support softmax/FA batch and broadcast (#14449)

b7265648

CUDA: broadcasting for FlashAttention mask (#14500)

3045a1eb

CUDA: add softmax broadcast (#14475)

3b38afdf

ggerganov force pushed from be8d4700 to 3b38afdf 126 days ago

ggerganov merged 55a1c5a5 into master 126 days ago

ggerganov deleted the gg/ggml-batch-soft-max-ops branch 126 days ago

Reviewers

JohannesGaessler

Assignees

No one assigned

Labels

testing Nvidia GPU Vulkan ggml SYCL Apple Metal Ascend NPU

Milestone

No milestone