llama.cpp
80b717d4 - vulkan: Use unclamped loads for flash attention mask (#12720)

Commit
138 days ago
vulkan: Use unclamped loads for flash attention mask (#12720) nem1 must be a multiple of GGML_KQ_MASK_PAD, and GGML_KQ_MASK_PAD is a multiple of the number of rows in the matrix. The KV dim is a multiple of the number of columns for the aligned shader.
Author
Parents
  • ggml/src/ggml-vulkan
    • File
      ggml-vulkan.cpp
    • vulkan-shaders
      • File
        flash_attn_cm2.comp