llama.cpp
03562f3a - llama : support attention bias on LLaMA architecture (#4283)

Commit

1 year ago

llama : support attention bias on LLaMA architecture (#4283) * Support attention_bias on LLaMA architecture QKVO bias, should fix InternLM (https://github.com/ggerganov/llama.cpp/issues/3133) and works for LLaMAfied Qwen models (https://github.com/ggerganov/llama.cpp/pull/3743#issuecomment-1825923608). * check existence of qkvo bias while loading llama models Tested on LLaMA2, CUDA and CPU. * Update llama.cpp

References

#4283 - Support attention_bias on LLaMA architecture

Author

RealJosephus

Parents

37c746d6

llama.cpp 03562f3a - llama : support attention bias on LLaMA architecture (#4283)

llama.cpp
03562f3a - llama : support attention bias on LLaMA architecture (#4283)