llama.cpp
vulkan: Use fp16 for the flash attention P*V multiplication
#12783

Merged

vulkan: Use fp16 for the flash attention P*V multiplication #12783

0cc4m merged 1 commit into ggml-org:master from jeffbolznv:flash_attn_prec

vulkan: Use fp16 for the flash attention P*V multiplication

dab1f028

github-actions added Vulkan

github-actions added ggml

jeffbolznv requested a review from

0cc4m 1 year ago

0cc4m approved these changes on 2025-04-09

0cc4m merged 7ecd780b into master 1 year ago

Reviewers

0cc4m

Assignees

No one assigned

Labels

Vulkan ggml

Milestone

No milestone