ochafik/llama.cpp

Pull Requests Commits

enable dpcpp nightly builds with libraries (#13406)

AD2605 committed 328 days ago

Verified 14492144

mtmd : Use RMS norm for InternVL 3 38B and 78B mmproj (#13459)

city96 committed 328 days ago

Verified c1040239

tools : fix uninitialized llama_batch in server (#13436)

aumfer committed 329 days ago

Verified 9a390c48

scripts : exit compare-llama-bench.py gracefully when there's nothing to compare (#13451)

CISC committed 329 days ago

Verified 09232370

CUDA: fix crash with partial offloading of MoE (#13439)

JohannesGaessler committed 329 days ago

Verified 7474e00b

Add `--no-op-offload` to improve `-ot` pp perf in MoE models like llama4 400B (#13386)

hjc4869 committed 329 days ago

Verified 7f323a58

mtmd : support InternVL 3 38B and 78B mmproj (#13443)

city96 committed 329 days ago

Verified 3eac2093

mtmd : move helpers to dedicated file (#13442)

ngxson committed 329 days ago

Verified a634d75d

docs : Fix typo in InternVL3 model name (#13440)

99991 committed 330 days ago

Verified 62d4250e

CUDA: fix race conditions FlashAttention kernels (#13438)

JohannesGaessler committed 330 days ago

Verified 0208355f

vocab : add ByteDance-Seed/Seed-Coder (#13423)

CISC committed 330 days ago

Verified d2a4ef05

mtmd : add hard limit on image resolution for qwen2vl / qwen2.5vl (#13434)

ngxson committed 330 days ago

Verified 15e6125a

server : update docs (#13432)

ngxson committed 330 days ago

Verified 3b24d26c

llguidance : set tokenizer slices to default (#13424)

CISC committed 330 days ago

Verified 43dfd741

ci: free_disk_space flag enabled for intel variant (#13426)

Thammachart committed 330 days ago

Verified b064a51a

mtmd : support InternVL 2.5 and 3 (#13422)

ngxson committed 330 days ago

Verified 053367d1

CUDA: fix FlashAttention on Turing (#13415)

JohannesGaessler committed 330 days ago

Verified d8919424

arg : add env var to control mmproj (#13416)

ngxson committed 330 days ago

Verified 7fef1176

vulkan: scalar flash attention implementation (#13324)

jeffbolznv committed 330 days ago

Verified dc1d2adf

chore(llguidance): use tagged version that does not break the build (#13413)

HRKings committed 331 days ago

Verified 7c28a74e

server : vision support via libmtmd (#12898)

ngxson committed 331 days ago

Verified 33eff402

sycl : implementation of reordered Q4_0 MMVQ for Intel GPUs (#12858)

Alberto Cabrera Pérez committed 331 days ago

Verified 17512a94

metal : optimize MoE for large batches (#13388)

ggerganov committed 331 days ago

Verified 611aa914

CUDA: FA support for Deepseek (Ampere or newer) (#13306)

JohannesGaessler committed 331 days ago

Verified 0cf6725e

llama : do not crash if there is no CPU backend (#13395)

slaren committed 331 days ago

Verified 27ebfcac

CUDA: fix crash on large batch size for MoE models (#13384)

JohannesGaessler committed 331 days ago

Verified 5c86c9ed

imatrix : Add --parse-special for enabling parsing of special tokens in imatrix calculation (#13389)

bartowski1182 committed 331 days ago

Verified efb8b47e

llama-run: add support for downloading models from ModelScope (#13370)

yeahdongcn committed 331 days ago

Verified 0527771d

mtmd : fix batch_view for m-rope (#13397)

ngxson committed 331 days ago

Verified 2189fd3b

llama : one-off chat template fix for Mistral-Small-2503 (#13398)

ngxson committed 331 days ago

Verified 3f96aeff

Newer Older