llama.cpp
llama : use F32 precision in Qwen2 attention and no FA
#8412
Merged

llama : use F32 precision in Qwen2 attention and no FA #8412

ggerganov merged 1 commit into master from gg/qwen2-f32-prec
ggerganov
ggerganov llama : use F32 precision in Qwen2 attention and no FA
7c9e9a22
JohannesGaessler
JohannesGaessler approved these changes on 2024-07-10
ggerganov ggerganov merged 7a221b67 into master 1 year ago
ggerganov ggerganov deleted the gg/qwen2-f32-prec branch 1 year ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone