ochafik/llama.cpp

Pull Requests Commits

Skip even more operations when we're not at the last eval batch

ochafik committed 2 years ago

c35eb42c

Skip computation of unused logits during batch prompt eval (drop other batch positions after writing their kv to cache)

ochafik committed 2 years ago

ad7ef7af

Fix unicode in grammars (fixes #2501) (#2553)

ejones committed 2 years ago

Verified 604b8bdf

server : support for saving templates in browser LocalStorage (#2486)

staviq committed 2 years ago

Verified 10151bee

README: fix LLAMA_CUDA_MMV_Y documentation (#2647)

JohannesGaessler committed 2 years ago

Verified 0992a7b8

[Zig] Fixing Zig build and improvements (#2554)

SlyEcho committed 2 years ago

Verified 6ddeefad

Add --cfg-negative-prompt-file option for examples (#2591)

KerfuffleV2 committed 2 years ago

Verified 8dae7ce6

llama : replace (permute + reshape + view_1d) with (view_3d) (#2538)

ggerganov committed 2 years ago

Verified a73ccf1a

tests : adds simple llama grammar tests (#2618)

drbh committed 2 years ago

Verified 7cf54e1f

ggml-alloc : fix discrepency between measure&eval (#2639)

lshzh-ww committed 2 years ago

Verified a872a2b2

cmake : install ggml-meta.metal if LLAMA_METAL (#2449)

ickc committed 2 years ago

Verified 0919a0f7

metal : print error of load pipeline state (#2564)

jhen0409 committed 2 years ago

Verified ed53db86

metal : enable ggml-alloc (#2627)

lshzh-ww committed 2 years ago

Verified fc8ef549

metal : matrix-matrix multiplication kernel (#2615)

lshzh-ww committed 2 years ago

Verified bf83bff6

scripts : add helper script to get wikitext

ggerganov committed 2 years ago

Verified b5ffb284

server : add missing /json-schema-to-grammar.mjs (#2616)

jhen0409 committed 2 years ago

Verified 3ebb0093

metal : return null instead of exit(1) (#2573)

jhen0409 committed 2 years ago

Verified d783f798

server : add --numa support (#2524)

TerrorJack committed 2 years ago

Verified d75561df

llama : add missing enum keyword in function signatures (#2610)

cztomsik committed 2 years ago

Verified 348acf18

CUDA: launch_bounds, small q4_K, q5_K mmq refactor (#2596)

JohannesGaessler committed 2 years ago

Verified 1cd06fa2

server : fix default grammar by use empty string in the UI (#2604)

jhen0409 committed 2 years ago

Verified 2feb8934

server : implement json-schema-to-grammar.mjs & add grammar param in the UI (#2588)

jhen0409 committed 2 years ago

Verified 5517d6e6

Enhance Windows 7 and below compatibility. (#2592)

vxiiduu committed 2 years ago

Verified f31b5397

test : add simple grammar parsing tests (#2594)

drbh committed 2 years ago

Verified ee77efea

CUDA: Fixed OpenLLaMA 3b mmq, reduced compile time (#2590)

JohannesGaessler committed 2 years ago

Verified f64d44a9

Adding support for llama2.c models (#2559)

byte-6174 committed 2 years ago

Verified b19edd54

server: fixed wrong variable name in timing json (#2579)

Equim-chan committed 2 years ago

Verified 53dc3994

Handle `ENABLE_VIRTUAL_TERMINAL_PROCESSING` more gracefully on earlier versions of Windows.

DannyDaemonic committed 2 years ago

Verified 9ca4abed

Add --n-predict -2 for stopping generation on full context (#2565)

crasm committed 2 years ago

Verified e59fcb2b

Fix grammar-based sampling issue in server (#2566)

krasserm committed 2 years ago

Verified 16387577

Older