ochafik/llama.cpp

Pull Requests Commits

support RETRIES=N in server test utils

ochafik committed 1 year ago

d5aff5af

tool-call: add `script/tool_bench.py`

ochafik committed 1 year ago

6703c395

tool-call: refactor common chat / tool-call api (+ tests / fixes) (#11900)

ochafik committed 1 year ago

Verified 63e489c0

server : add TEI API format for /rerank endpoint (#11942)

ngxson committed 1 year ago

Verified 63ac1285

scripts: corrected encoding when getting chat template (#11866) (#11907)

MoonRide303 committed 1 year ago

Verified 5137da7b

docs : Fix duplicated file extension in test command (#11935)

xiaobing318 committed 1 year ago

Verified 09aaf4f1

CUDA: use async data loading for FlashAttention (#11894)

JohannesGaessler committed 1 year ago

Verified 73e2ed3c

update release requirements (#11897)

netrunnereve committed 1 year ago

Verified f7b1116a

server : fix divide-by-zero in metrics reporting (#11915)

aviallon committed 1 year ago

Verified c4d29baf

vulkan: implement several ops relevant for ggml_opt (#11769)

remyoudompheng committed 1 year ago

Verified 2eea03d8

server : bump httplib to 0.19.0 (#11908)

ngxson committed 1 year ago

Verified 0f2bbe65

common : Fix a typo in help (#11899)

standby24x7 committed 1 year ago

Verified fe163d5b

ci : fix (again) arm64 build fails (#11895)

ngxson committed 1 year ago

Verified 818a340e

vulkan: support multi/vision rope, and noncontiguous rope (#11902)

jeffbolznv committed 1 year ago

Verified bf42a23d

metal : fix the crash caused by the lack of residency set support on Intel Macs. (#11904)

halechan committed 1 year ago

Verified c2ea16f2

scripts: fix compare-llama-bench commit hash logic (#11891)

JohannesGaessler committed 1 year ago

Verified 6dde1782

examples: fix typo in imatrix/README.md (#11884)

708-145 committed 1 year ago

Verified fc10c38d

metal : optimize dequant q6_K kernel (#11892)

akretz committed 1 year ago

Verified 22885105

readme : add notice about new package registry (#11890)

ggerganov committed 1 year ago

Verified c2cd24fb

repo : update links to new url (#11886)

ggerganov committed 1 year ago

Verified 68ff663a

server: fix type promotion typo causing crashes w/ --jinja w/o tools (#11880)

ochafik committed 1 year ago

Verified f3552296

vulkan: initial support for IQ1_S and IQ1_M quantizations (#11528)

remyoudompheng committed 1 year ago

Verified fc1b0d09

llguidance build fixes for Windows (#11664)

mmoskal committed 1 year ago

Verified 89daa256

opencl: Fix rope and softmax (#11833)

lhez committed 1 year ago

Verified 300907b2

cuda : add ampere to the list of default architectures (#11870)

slaren committed 1 year ago

Verified 94b87f87

docker : drop to CUDA 12.4 (#11869)

ggerganov committed 1 year ago

Verified dbc2ec59

llama : add completion for --chat-template-file (#11860)

danbev committed 1 year ago

Verified 3d68f034

ggml: optimize some vec dot functions for LoongArch ASX (#11842)

MQ-mengqing committed 1 year ago

Verified 38e32eb6

vulkan: linux builds + small subgroup size fixes (#11767)

netrunnereve committed 1 year ago

Verified a4f011e8

llama-bench : fix unexpected global variable initialize sequence issue (#11832)

theraininsky committed 1 year ago

Verified a7b8ce22

Older