llama.cpp
e8c54893 - ggml-webgpu: FlashAttention refactor + standardize quantization support (#23834)

Commit
7 days ago
ggml-webgpu: FlashAttention refactor + standardize quantization support (#23834) * Start work on flash_attn refactor * Refactor * Split k/v quantization * Refactor and abstract quantization logic for flash_attn and mul_mat * Add quantization support to tile path * formatting * Move to functions, add a check
Author
Parents
Loading