ochafik/llama.cpp

Pull Requests Commits

Update test-cli.cpp

ochafik committed 1 year ago

243fd5dd

`test-cli`: add llama-cli as order-only prerequisite in Makefile

ochafik committed 1 year ago

ca512cc9

`test-cli`: greedy sampling + print exception messages

ochafik committed 1 year ago

82d5e91a

`main`: add test-cli + ensure output still logged w/ --log-disable

ochafik committed 1 year ago

030fda09

kompute: add backend registry / device interfaces (#10045)

slp committed 1 year ago

Verified 61408e7f

ggml : fix memory leaks when loading invalid gguf files (#10094)

slaren committed 1 year ago

Verified b9e02e81

readme : more lora detail in main example readme (#10064)

richdougherty committed 1 year ago

Verified 6763f713

convert : more detailed convert lora usage docs (#10065)

richdougherty committed 1 year ago

Verified 79a2bc04

ggml : add Q4_0_8_8 RISC-V GEMV and GEMM kernels (#10029)

xctan committed 1 year ago

Verified fc83a9e5

llama : refactor model loader with backend registry (#10026)

slaren committed 1 year ago

Verified c5b0f4b5

ggml: Add POOL2D OP for GPU acceleration to the Vulkan backend in the MobileVLM model. (#9763)

cyzero-kim committed 1 year ago

Verified 8f275a7c

llama : remove Tail-Free sampling (#10071)

ggerganov committed 1 year ago

Verified 8d8ff715

llama : Add IBM granite template (#10013)

arch-btw committed 1 year ago

Verified 61715d5c

flake.lock: Update (#10063)

ggerganov committed 1 year ago

Verified 07028f9d

musa: workaround for Guilty Lockup in cleaning src0 (#10042)

yeahdongcn committed 1 year ago

Verified 524afeec

server : don't overfill the batch during infill (#10018)

ggerganov committed 1 year ago

Verified 8125e6cb

llama : switch KQ multiplication to F32 precision by default (#10015)

ggerganov committed 1 year ago

Verified 8841ce3f

ggerganov committed 1 year ago

Verified cc2983d3

increase cuda_cpy block size (ggml/996)

bssrdf committed 1 year ago

Verified 8c60a8a4

scripts : fix amx sync [no ci]

ggerganov committed 1 year ago

Verified 9e4a2563

metal : support permuted matrix multiplicaions (#10033)

ggerganov committed 1 year ago

Verified 66875035

llama : add DRY sampler (#9702)

wwoodsTM committed 1 year ago

Verified ff252ea4

llama: string_split fix (#10022)

Xarbirus committed 1 year ago

Verified d80fb71f

llamafile : extend sgemm.cpp support for Q5_0 models (#10010)

Srihari-mcw committed 1 year ago

Verified 2f8bd2b9

server : check that the prompt fits in the slot's context (#10030)

ggerganov committed 1 year ago

Verified bc5ba007

server : refactor slot input data, move tokenizer to HTTP thread (#10023)

ngxson committed 1 year ago

Verified 958367bf

ci : fix cmake flags for SYCL

ggerganov committed 1 year ago

Verified 40f25557

CUDA: fix insufficient buffer clearing for MMQ (#10032)

JohannesGaessler committed 1 year ago

Verified 167a5156

CUDA: fix MMQ for non-contiguous src0, add tests (#10021)

JohannesGaessler committed 1 year ago

Verified c39665f5

server : samplers accept the prompt correctly (#10019)

wwoodsTM committed 1 year ago

Verified 0a1c750c

Older