ggerganov/llama.cpp

Pull Requests Commits

Minor, plus rebase on master

Iwan Kawrakow committed 2 years ago

a0242a83

RMSE-optimized quants for all quantization types

Iwan Kawrakow committed 2 years ago

e435bfd9

ggml : fix Q4_3 cuBLAS

ggerganov committed 2 years ago

Verified 0e018fe0

ci : trigger CI for drafts, but not most PR actions (#1125)

sw committed 2 years ago

Verified 857308d1

Fix CI: ARM NEON, quantization unit tests, editorconfig (#1122)

sw committed 2 years ago

Verified c50b6288

ggml : unit test for quantization functions (#953)

unbounded committed 2 years ago

Verified 5f939498

llama : print timings on ctrl+c exit (#1021)

wbpxre150 committed 2 years ago

Verified 36b4f7e0

llama : have n_batch default to 512 (#1091)

eiery committed 2 years ago

Verified 10f19c11

cmake : fix build under Windows when enable BUILD_SHARED_LIBS (#1100)

howard0su committed 2 years ago

Verified 7e312f16

ggml : fix AVX build + update to new Q8_0 format

ggerganov committed 2 years ago

872c365a

ggml : alternative Q4_3 implementation using modified Q8_0 (#1109)

ggerganov committed 2 years ago

Verified 955ef9a5

ggml : AVX2 optimization for vec_dot_q4_3_q8_0 and refactoring (#1099)

sw committed 2 years ago

Verified c5aa5e57

examples : Improve Alpaca Default Repeat Penalty: Better Match Alpaca.cpp Experience (#1107)

HanClinto committed 2 years ago

Verified e9a9cb0c

llama : add api for getting/setting the complete state: rng, logits, embedding and kv_cache (#1105)

xaedes committed 2 years ago

Verified b6e7f9b0

Improve cuBLAS performance by using a memory pool (#1094)

slaren committed 2 years ago

Verified 50cb666b

llama : fixed rlimit error message (#888)

apaz-cli committed 2 years ago

Verified 25d7abbd

cmake : link threads publicly to ggml (#1042)

fumiama committed 2 years ago

Verified 018f2279

main : evaluate tokens in batches after swapping context (#1014)

grencez committed 2 years ago

Verified 94112882

llama : remember and restore kv cache data pointers (#1104)

xaedes committed 2 years ago

Verified 8687c1f2

ggml : a faster version for Q4_1 x Q8_0 dot products (#1083)

ikawrakow committed 2 years ago

Verified 1bfc153e

Show perplexity ETA in hours and minutes (#1096)

slaren committed 2 years ago

Verified 3d59769c

llama : fix comment for "output.weight" tensor

ggerganov committed 2 years ago

Verified d40fded9

Add ggml-model-*.bin checksums for 7B, 13B, 30B, 65B (#1088)

sw committed 2 years ago

Verified 2510c183

ggml : sync ggml (add GPT-NeoX RoPE implementation)

ggerganov committed 2 years ago

Verified 12b5900d

ggml : fix bug in ggml_compute_forward_dup_f32()

ggerganov committed 2 years ago

Verified 9ff334f3

Add Q4_3 support to cuBLAS (#1086)

slaren committed 2 years ago

Verified 2005469e

ggml : do not break cuBLAS build (Q4_3 is not yet implemented)

ggerganov committed 2 years ago

Verified 8a1756ab

ggml : fix Q4_3 quantization

ggerganov committed 2 years ago

Verified 66aab460

llama : multi-threaded quantization (#1075)

ikawrakow committed 2 years ago

Verified 38de86a7

ggml : add Q4_3 quantization (#1082)

ggerganov committed 2 years ago

Verified e0305ead

Older