llama.cpp
03562f3a - llama : support attention bias on LLaMA architecture (#4283)

Commit
1 year ago
llama : support attention bias on LLaMA architecture (#4283) * Support attention_bias on LLaMA architecture QKVO bias, should fix InternLM (https://github.com/ggerganov/llama.cpp/issues/3133) and works for LLaMAfied Qwen models (https://github.com/ggerganov/llama.cpp/pull/3743#issuecomment-1825923608). * check existence of qkvo bias while loading llama models Tested on LLaMA2, CUDA and CPU. * Update llama.cpp
Author
Parents
Loading