ochafik/llama.cpp

Pull Requests Commits

server: support add_generation_prompt query param

ochafik committed 1 year ago

4e643f1e

run: allow to customize prompt by env var LLAMA_PROMPT_PREFIX (#12041)

benoitf committed 1 year ago

Verified 7ad0779f

Some llama-run cleanups (#11973)

ericcurtin committed 1 year ago

Verified f777a73e

ggml-cpu: Support s390x SIMD Instruction Set (#12019)

taronaeo committed 1 year ago

Verified af7747c9

CUDA: app option to compile without FlashAttention (#12025)

JohannesGaessler committed 1 year ago

Verified a28e0d5e

llava: build clip image from pixels (#11999)

tinglou committed 1 year ago

Verified 36c258ee

ci : fix arm upload artifacts (#12024)

ggerganov committed 1 year ago

Verified f3e64859

CUDA: optimize FA for GQA + large batches (#12014)

JohannesGaessler committed 1 year ago

Verified 5fa07c2f

ci : Build on Github-hosted arm64 runners (#12009)

Rohanjames1997 committed 1 year ago

Verified 335eb04a

server : disable Nagle's algorithm (#12020)

ggerganov committed 1 year ago

Verified cf756d6e

cuda: Add Q5_1, Q5_0, Q4_1 and Q4_0 to F32 conversion support. (#12000)

gcp committed 1 year ago

Verified d7090842

llama.swiftui : add "Done" dismiss button to help view (#11998)

danbev committed 1 year ago

Verified de8b5a36

llama : skip loading unused tensors (#12004)

ggerganov committed 1 year ago

Verified 51f311e0

doc: update contributing guidelines [no ci] (#11969)

JohannesGaessler committed 1 year ago

Verified 586d5fe6

CUDA: correct the lowest Maxwell supported by CUDA 12 (#11984)

PureJourney committed 1 year ago

Verified ecc8e3ae

MUSA: support ARM64 and enable dp4a .etc (#11843)

BodhiHu committed 1 year ago

Verified 0b3863ff

clip : fix visual encoders with no CLS (#11982)

alex-jw-brooks committed 1 year ago

Verified ee02ad02

server (webui): Fix Premature Submission During IME Conversion (#11971)

mmngays committed 1 year ago

Verified c392e509

ggml-cpu: Add CPU backend support for KleidiAI library (#11390)

chaxu01 committed 1 year ago

Verified c5d91a74

ggml: aarch64: implement SVE kernels for q3_K_q8_K vector dot (#11917)

Vithulep committed 1 year ago

Verified 4806498b

run : add --chat-template-file (#11961)

engelmi committed 1 year ago

Verified 0d559580

doc: add links to ggml examples [no ci] (#11958)

JohannesGaessler committed 1 year ago

Verified d04e7163

common : add llama.vim preset for Qwen2.5 Coder (#11945)

danbev committed 1 year ago

Verified d07c6213

speculative : update default params (#11954)

ggerganov committed 1 year ago

Verified abd4d0bc

llama : fix indentation in llama-grammar [no ci] (#11943)

danbev committed 1 year ago

Verified 9626d935

server : (webui) Enable communication with parent html (if webui is in iframe) (#11940)

igardev committed 1 year ago

Verified b58934c1

tool-call: refactor common chat / tool-call api (+ tests / fixes) (#11900)

ochafik committed 1 year ago

Verified 63e489c0

server : add TEI API format for /rerank endpoint (#11942)

ngxson committed 1 year ago

Verified 63ac1285

scripts: corrected encoding when getting chat template (#11866) (#11907)

MoonRide303 committed 1 year ago

Verified 5137da7b

docs : Fix duplicated file extension in test command (#11935)

xiaobing318 committed 1 year ago

Verified 09aaf4f1

Older