llama.cpp
llama : use F32 precision in Qwen2 attention and no FA
#8412
Merged

Commits
  • llama : use F32 precision in Qwen2 attention and no FA
    ggerganov committed 1 year ago
Loading