llama.cpp
e8c54893 - ggml-webgpu: FlashAttention refactor + standardize quantization support (#23834)

Commit

7 days ago

ggml-webgpu: FlashAttention refactor + standardize quantization support (#23834) * Start work on flash_attn refactor * Refactor * Split k/v quantization * Refactor and abstract quantization logic for flash_attn and mul_mat * Add quantization support to tile path * formatting * Move to functions, add a check

References

#23834 - ggml-webgpu: FlashAttention refactor + standardize quantization support

Author

reeselevine

Parents

3c7450ce

llama.cpp e8c54893 - ggml-webgpu: FlashAttention refactor + standardize quantization support (#23834)

llama.cpp
e8c54893 - ggml-webgpu: FlashAttention refactor + standardize quantization support (#23834)