llama : grouped-query attention + LLaMAv2 70B support #2276
CUDA: GQA implementation
c1893cd9
ggerganov
force pushed
from
f7bb5e91
2 years ago
ggerganov
changed the title llama : poc for running 70B on CPU (WIP) llama : guided-query attention + LLaMAv2 70B support 2 years ago
ggerganov
changed the title llama : guided-query attention + LLaMAv2 70B support llama : grouped-query attention + LLaMAv2 70B support 2 years ago
ggerganov
marked this pull request as ready for review 2 years ago
llama : support for GQA and LLaMAv2 70B
3fdc00f5
ggerganov
force pushed
to
3fdc00f5
2 years ago
py : fix hparams parsing (if-else blocks)
2dac31b3
py : oh boy ..
c594992d
klosax
commented
on 2023-07-23
help : fix gqa value for 70B
8194d591
Merge branch 'master' into llama-v2-70b
3ed2553f
ggerganov
merged
e76d630d
into master 2 years ago
ggerganov
deleted the llama-v2-70b branch 2 years ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub