ggerganov/llama.cpp

Pull Requests Commits

safer jinja `llama_chat_templates` struct

ngxson committed 1 year ago

c9e7cbb0

minja: fix vigogne (https://github.com/google/minja/pull/22)

ochafik committed 1 year ago

cc503564

Disable jinja test that has a cryptic windows failure

ochafik committed 1 year ago

e3c475cd

Add missing optional include to server.cpp

ochafik committed 1 year ago

0e74c9da

Rm unused optional include

ochafik committed 1 year ago

fc60802b

Fix copy elision warning

ochafik committed 1 year ago

5074e6fe

Flush stdout in chat template before potential crash

ochafik committed 1 year ago

33322e82

Forward decl minja::chat_template to avoid eager json dep

ochafik committed 1 year ago

e63520f3

Normalize newlines in test-chat-templates for windows tests

ochafik committed 1 year ago

ee1e10e2

Revert LLAMA_CHATML_TEMPLATE refactor

ochafik committed 1 year ago

d5fa351a

Attempt to fix linkage of LLAMA_CHATML_TEMPLATE

ochafik committed 1 year ago

81c0d437

Merge remote-tracking branch 'origin/master' into jinja

ochafik committed 1 year ago

40db7896

Refactor common_chat_* functions to accept minja template + use_jinja option

ochafik committed 1 year ago

b75d0622

llama.android: add field formatChat to control whether to parse special tokens when send message (#11270)

codezjx committed 1 year ago

Verified 3edfa7d3

rpc : early register backend devices (#11262)

rgerganov committed 1 year ago

Verified 667d7284

vocab : fix double-eos check (#11273)

ggerganov committed 1 year ago

Verified a133566d

llama : fix deprecation message: vocabable -> vocab (#11269)

dwrensha committed 1 year ago

Verified 960ec652

README : added kalavai to infrastructure list (#11216)

musoles committed 1 year ago

Verified 7a689c41

vulkan: support copy from f32 to q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl (#11166)

jeffbolznv committed 1 year ago

Verified bd38ddea

vulkan: optimize coopmat2 q4_k/q5_k dequant functions. (#11206)

jeffbolznv committed 1 year ago

Verified 466300fe

vulkan: optimize coopmat2 q2_k dequant function (#11130)

jeffbolznv committed 1 year ago

Verified 206bc534

llama : add internlm3 support (#11233)

RunningLeon committed 1 year ago

Verified 4dbc8b9c

CUDA: backwards pass for misc. ops, add tests (#11257)

JohannesGaessler committed 1 year ago

Verified 9c8dcefe

llama : add `llama_model_load_from_splits` (#11255)

ngxson committed 1 year ago

Verified 681149ce

ggml: aarch64: implement SVE kernels for q4_K_q8_K vector dot (#11227)

fj-y-saito committed 1 year ago

Verified c67cc983

vulkan: scale caching for k quants + misc fixes (#11081)

netrunnereve committed 1 year ago

Verified adc5dd92

ci : use -no-cnv in gguf-split tests (#11254)

ggerganov committed 1 year ago

Verified f11cfdfd

fix: ggml: fix vulkan-shaders-gen build (#10448)

sparkleholic committed 1 year ago

Verified 1d850433

RoPE: fix back, CUDA support for back + noncont. (#11240)

JohannesGaessler committed 1 year ago

Verified 432df2d5

examples : add embd_to_audio to tts-outetts.py [no ci] (#11235)

danbev committed 1 year ago

Verified 0ccd7f3e

Older