ngxson/llama.cpp

Pull Requests Commits

include "ggml-cpu.h"

ngxson committed 254 days ago

c37252b1

docker : do not build tests

ngxson committed 254 days ago

991958a0

rpc : fix cache directory initialization (#13188)

hbuxiaofei committed 254 days ago

Verified a0f7016d

scripts: n_depth for compare-llama-bench [no ci] (#13201)

JohannesGaessler committed 255 days ago

Verified 19e899ce

server : Prefilling assistant message in openai compatible API (#13174)

matteoserva committed 255 days ago

Verified e2e1ddb9

sampling : when top-k <= 0 -> noop (#13173)

ggerganov committed 255 days ago

Verified d9d398f8

llama-bench: fixed size of fields to correctly map to values (#13183)

Alberto Cabrera Pérez committed 255 days ago

Verified 5a639801

CUDA: fix non-cont. inputs for batched mat mul (#13155)

JohannesGaessler committed 255 days ago

Verified cdf76586

llama : llm_type order by size (#13177)

CISC committed 255 days ago

Verified 7d3af70b

mtmd : add qwen2vl and qwen2.5vl (#13141)

ngxson committed 255 days ago

Verified 00e3e5a1

llama : set qwen3 model type sizes (#13175)

CISC committed 255 days ago

Verified e98b3692

llama-graph : fix text position for mrope (#13159)

ngxson committed 255 days ago

Verified b6ce7430

model : Nomic Embed Text V2 with Mixture-of-Experts (MoE) architecture (#12466)

manyoso committed 256 days ago

Verified 5f5e39e1

clip : fix model size display (#13153)

ngxson committed 256 days ago

Verified eaea3253

fix(rpc): Improve input validation and error handling (#13069)

thevilledev committed 256 days ago

Verified 43ddab6e

llama-bench: add `-d` depth arg (#13096)

thevishalagarwal committed 256 days ago

Verified 1831f538

mtmd : fix glm-edge redundant token count (#13139)

ngxson committed 256 days ago

Verified 4e87962e

context : do not clear output buffer on reserve (#13152)

pockers21 committed 256 days ago

Verified fb0471d1

llama : (mrope) allow using normal 1D position for text token (#13138)

ngxson committed 256 days ago

Verified d2b2031e

clip : refactor set input for cgraph + fix qwen2.5vl input (#13136)

ngxson committed 256 days ago

Verified 5fa9e63b

SYCL: Add all missing unary kernels (#13074)

qnixsynapse committed 256 days ago

Verified a4c340f9

readme : update hot topics (#13150)

ggerganov committed 256 days ago

Verified d0a417f3

common : fix noreturn compile warning (#13151)

ggerganov committed 256 days ago

Verified 43f2b071

llama-chat : fix typo GML --> GLM (#13143)

ngxson committed 256 days ago

Verified e5d6c255

musa: fix typo in cc control (#13144)

yeahdongcn committed 256 days ago

Verified f0dd6a19

CUDA: fix q_nope_absorbed prec for DS 2 Lite f16 (#13137)

JohannesGaessler committed 256 days ago

Verified 69699be4

arg : fix unused variable (#13142)

ngxson committed 256 days ago

Verified 85f36e5e

llama-bench : Add `--override-tensors` arg (#12922)

4onen committed 257 days ago

Verified c0a97b76

llama-chat : fix wrong template in GLM4-0414 (#13140)

matteoserva committed 257 days ago

Verified ced44be3

musa: fix build warning (#13129)

yeahdongcn committed 257 days ago

Verified e291450b

Older