CUDA: Prefer vector flash decoding kernel for Gemma models #12738
Prefer vector flash decoding kernel for Gemma models
f7d07dd2
Update ggml/src/ggml-cuda/fattn.cu
ce71aba0
gaugarg-nv
deleted the gemma_flash_attention branch 63 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub