ggerganov/llama.cpp

Pull Requests Commits

Fix q_xxs using mul_mat_q

Aidan committed 1 year ago

4b177010

metal : disable FA kernel for HS=256 (#7556)

ggerganov committed 1 year ago

Verified 62bfef51

llama : add comments about experimental flags (#7544)

ggerganov committed 1 year ago

Verified eaf6e031

github: add self sorted issue ticket forms (#7543)

mofosyne committed 1 year ago

Verified d6ef0e77

flake.lock: Update (#7540)

ggerganov committed 1 year ago

Verified dff451cf

main: replace --no-special with --special (#7534)

mofosyne committed 1 year ago

Verified d298382a

Fix aya-23 conversion scripts (#7539)

Galunid committed 1 year ago

Verified 32a28217

llama : add Smaug 70B support (#7402)

bartowski1182 committed 1 year ago

Verified c429b33b

Readme: add akx/ggify to tools (#1484)

akx committed 1 year ago

Verified 9146d36f

SimpleChat Completion Mode flexibility and cleanup, Settings gMe, Optional sliding window (#7480)

hanishkvc committed 1 year ago

Verified b9adcbbf

train : change default FA argument (#7528)

ggerganov committed 1 year ago

Verified 9588f196

labeler: added Apple Metal detector (+Kompute) (#7529)

mofosyne committed 1 year ago

Verified 3cbd23ed

main : don't print special tokens with --grammar (#6923)

jart committed 1 year ago

Verified 00c63907

ggml: aarch64: SVE kernels for q8_0_q8_0, q4_0_q8_0 vector dot (#7433)

msy-kato committed 1 year ago

Verified faa0e697

android : module (#7502)

eltonkola committed 1 year ago

Verified 9791f402

fix missing slash in `fs_get_cache_directory()` (#7503)

ngxson committed 1 year ago

Verified 902184dd

Make tokenize CLI tool have nicer command line arguments. (#6188)

Noeda committed 1 year ago

Verified 57684331

gguf-py : fix and simplify quantized shape round-trip (#7483)

compilade committed 1 year ago

Verified b83bab15

flake.lock: Update (#7232)

ggerganov committed 1 year ago

Verified d041d2ce

docker.yml: disable light-intel and server-intel test (#7515)

mofosyne committed 1 year ago

Verified 27891f6d

Add support for ArcticForCausalLM (#7020)

fairydreaming committed 1 year ago

Verified fbca2f27

add build shared lib in win release package (#7438)

arthw committed 1 year ago

Verified 0df0aa8e

readme : remove trailing space (#7469)

ggerganov committed 1 year ago

Verified 74f33adf

ggml : silence UB sanitizer error during iq2_xxs quantization (#0)

ggerganov committed 1 year ago

Verified 1debe727

Fix phi3 chat template confusion with zephyr (#7449)

tristandruyen committed 1 year ago

Verified 007489e8

readme : add Bunny in supported models [no ci] (#7469)

criminact committed 1 year ago

Verified 8b94e799

llama : add getters for n_threads/n_threads_batch (#7464)

danbev committed 1 year ago

Verified 3015851c

ci : use Pythia models instead of OpenLlama (#7470)

ggerganov committed 1 year ago

Verified 55ac3b7a

readme : add GPT-NeoX + Pythia to the list of supported models (#7491)

felladrin committed 1 year ago

Verified dacfcebd

Add missing inference support for GPTNeoXForCausalLM (Pythia and GPT-NeoX base models) (#7461)

fairydreaming committed 1 year ago

Verified 9b82476e

Older