llama.cpp
716bd6de - vulkan: optimize mul_mat for small values of N (#10991)

Commit
166 days ago
vulkan: optimize mul_mat for small values of N (#10991) Make the mul_mat_vec shaders support N>1 (as a spec constant, NUM_COLS) where the batch_strides are overloaded to hold the row strides. Put the loads from the B matrix in the innermost loop because it should cache better. Share some code for reducing the result values to memory in mul_mat_vec_base.
Author
Parents
  • ggml/src/ggml-vulkan
    • File
      ggml-vulkan.cpp
    • vulkan-shaders
      • File
        mul_mat_vec.comp
      • File
        mul_mat_vec_base.comp
      • File
        mul_mat_vec_q2_k.comp
      • File
        mul_mat_vec_q3_k.comp
      • File
        mul_mat_vec_q4_k.comp
      • File
        mul_mat_vec_q5_k.comp
      • File
        mul_mat_vec_q6_k.comp
  • tests
    • File
      test-backend-ops.cpp