Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
ggerganov/llama.cpp
Pull Requests
Commits
vb/add-smollm3
0cc4m/model-backend-compare
0cc4m/vulkan-coopmat-amd-windows
0cc4m/vulkan-device-architecture
0cc4m/vulkan-enable-conv2d-on-apple
0cc4m/vulkan-fix-host-memory-max-size
0cc4m/vulkan-fix-mm-tests
0cc4m/vulkan-flash-attention-tuning
0cc4m/vulkan-fprintf-fix
0cc4m/vulkan-instance-cleanup
0cc4m/vulkan-intel-mmv-fix
0cc4m/vulkan-mmq-bk-step-tuning
0cc4m/vulkan-op-opt-step-sgd
0cc4m/vulkan-print-coopmat-shapes
0cc4m/vulkan-renderdoc
0cc4m/vulkan-suballoc-1gb
0cc4m/vulkan-subgroup-size-control-amd
7507-main-intel-dockerfile
SVE-vector-length-agnostic-VLA-gg
add-gemma2-soft-capping
add-pidfile
alloc-assert-fix
apply-3585
assert-restore-abort
avoid-gnu-source
batched-bench
build-metal-default
cam-simple-fix
ceb/bert
ceb/bert-tokenizer-fixes
ceb/convert-hf-refactor
ceb/convert-vocab-fallback
ceb/fix-badspecial-silentfail
ceb/fix-cmake-typo
ceb/fix-cuda-warning-flags
ceb/fix-draft-model-default
ceb/fix-logit-check
ceb/fix-msvc-build
ceb/fix-win-unicode-fpaths
ceb/fix-yarn-neox
ceb/libstdcpp-assertions
ceb/nomic-bert
ceb/nomic-vulkan-fix-add
ceb/perf-faster-multigpu
ceb/restore-convert
ceb/wpm-portable-tolower
cedo/add-outetts-v0.3
cedo/fix-q25vl
chunks
ci_cublas
ci/server/fix-slow-test
ci-android
cisc/test-tokenizers-remote
codeplay/dequant_q4_K_improvements
codeplay/fix-matmul-arith
codeplay/revert-host-alloc
codeplay/sycl-main
codeplay/tg-warmup
compilade/batch-splits
compilade/bitnet-ternary
compilade/convert-hf-refactor
compilade/convert-prequant
compilade/convert-prequant-compressed-tensors
compilade/convert-reflinks
compilade/convert-safetensors-parse
compilade/cuda-falcon-h1
compilade/cuda-tq2_0
compilade/faster-lazy-safetensors
compilade/faster-session-sizes
compilade/fix-batch-reserve-rwkv
compilade/fix-command-r
compilade/fix-convert-gemma-1-instruct
compilade/fix-metadata-name-extraction
compilade/fix-mpt-pretok
compilade/fix-output-overflow
compilade/fix-prequant-mxfp4-gpt-oss
compilade/fix-pydantic-example
compilade/fix-qp-iq1-problems
compilade/fix-recurrent-batch-init
compilade/fix-safetensors-unaligned
compilade/fix-server-long-system-prompt
compilade/fix-server-tests-penalty
compilade/fix-ssm-scan-groups
compilade/gguf-py-dequant
compilade/gguf-py-fix-old-numpy
compilade/gguf-py-fix-q-shape
compilade/gguf-py-mxfp4
compilade/gguf-py-quants-class
compilade/hybrid-kv_unified
compilade/imatrix-batched-chunks
compilade/imatrix-gguf-default
compilade/imatrix-gguf-warning
compilade/imatrix-mxfp4
compilade/imatrix-neutral-prior
compilade/imatrix-saner-3d
compilade/lazier-moe-convert-hf
compilade/lazy-bfloat16-convert-hf
compilade/lazy-convert-hf
compilade/lazy-tuples
compilade/mamba2
compilade/nul-str-token
compilade/optimal-rounding
compilade/output-reorder-lazy-sort
compilade/parallel-convert
compilade/pyright-fix-ignores
compilade/pyright-tests
compilade/q8_0-convert-hf
compilade/readonly-recurrent-inputs
compilade/refactor-kv-cache
compilade/refactor-kv-cache-gg
compilade/refactor-session-files
compilade/requirements-cpu-torch
compilade/superbpe
compilade/test-model-random
compilade/tokenize-example-parse-special
copilot/test-branch
cuda-batched-gemm
cuda-batched-gemm-deq
cuda-cublas-opts
cuda-multi-gpu
cuda-quantum-batch
custom-attention-mask
custom-attention-mask-no-roped-cache
danbev/gpu-sampling-rev-0
deploy
dequantize-matmul-3-gg
dev
f16c
fairydreaming/t5-clean-3-gg
fix_clblast
fix_cmd_name
fix_ctx_default
fix_q_xxs_mul_mat
fix_sycl_ci
fix-convert-modelname
fix-eos
fix-kv-cache-access
fix-ninja-metallib-build
fix-refact
fix-sessions
fix-tensor-split-zero
flash-attn
flash-attn-cuda
gabe-l-hart/HybridRecurrentCache
gg/add-phi-3-support
gg/add-phixtral
gg/allow-kv-overrides
gg/arch-add-desc
gg/arm-try-fix-msvc
gg/authors
gg/avoid-mutex
gg/batch-simplify-output
gg/bench-handle-decode-errors
gg/bert-f16
gg/bitnet
gg/bpe-preprocess
gg/build-linux-static
gg/build-pack-lib-include
gg/cache-token-to-piece
gg/cast-remove-src
gg/cb-naming
gg/check-python-version
gg/ci-add-arm-msvc-toolchain
gg/ci-fix-save-load
gg/ci-loongson
gg/ci-python
gg/ci-rename-job
gg/clang-tidy-disable-bugprone
gg/clip-fa
gg/cmake-dedup-link
gg/compare-change-path
gg/compare-mlx
gg/context-fix-enc-attn-type
gg/context-remove-logits-all
gg/context-sync-upon-output-reorder
gg/contrib-stale
gg/convert-fix-byte-tokens
gg/cpu-fix-cpy-iq
gg/cublas-f32
gg/disable-sgemm
gg/encode-pad-equal
gg/fa-no-kq-pad
gg/fa-no-kq-pad-save
gg/fa-req-kq-hs
gg/fix-amx
gg/fix-android
gg/fix-build-gf
gg/fix-cpu-blas
gg/fix-devops
gg/fix-embeddings-wip
gg/fix-fa-q-non-cont
gg/fix-logits-type
gg/fix-min-max
gg/fix-python-names
gg/fix-spm-added-tokens-dict-4958
gg/fix-starcoder2
gg/fix-sve
gg/fix-vld1q_s8_x4-4872
gg/flash-attn
gg/flash-attn-32x8
gg/flash-attn-a
gg/flash-attn-cuda
gg/flash-attn-interleave-cc
gg/flash-attn-mask-f16
gg/flash-attn-online
gg/flash-attn-rebase
gg/flash-attn-simd
gg/flash-attn-sync
gg/flash-attn-wip
gg/flash-attn-wip2
gg/flash-attn-wip3
gg/flash-attn-wip4
gg/float-pos
gg/ggml_scale
gg/ggml-atomic-int
gg/ggml-cont
gg/ggml-fix-zero-blocks
gg/ggml-rework-cgraph
gg/gguf-fix-null-defer
gg/gguf-py-0.11.0
gg/gpu-prec-tests
gg/grammar-refactor
gg/graph-prec
gg/graph-reuse-reset-fix
gg/hf
gg/hf-args
gg/hf-auto-dl
gg/hf-test
gg/hparams-swa-rope
gg/http-threads
gg/imatrix-gpu-4931
gg/imatrix-remove-assert
gg/indent
gg/infill-better-stop
gg/iq2-refactor-and-tests
gg/kv-cache-prepare-separation
gg/kv-compress
gg/kv-determinism
gg/kv-fix-shift
gg/lfm-fix-tensors
gg/lfs
gg/llama3-support
gg/llama-add-log
gg/llama-disambiguate
gg/llama-high-throughput-rebase
gg/llama-high-throughput-save
gg/llama-high-throughput-save2
gg/llama-kv-cache
gg/llama-quant-fix-sanity-checks
gg/llama-refactor-sampling
gg/llama-reorganize
gg/llama-shadow-on
gg/logits-slowdown
gg/mamba-fix-squeeze
gg/media-add-svg-logo
gg/metal-alloc-size
gg/metal-async
gg/metal-async-save-global-queue
gg/metal-batched
gg/metal-dequant-align
gg/metal-disable-fa-256
gg/metal-embed
gg/metal-f16
gg/metal-fa-f16-save
gg/metal-fa-f16
gg/metal-fa-vec-bs20
gg/metal-fattn-reqs
gg/metal-fix-build
gg/metal-fix-fa
gg/metal-fix-fa-2
gg/metal-fix-thread-safety
gg/metal-fuse-add-rms
gg/metal-mm-pad
gg/metal-mmid-max-rows
gg/metal-mul-mat-f16
gg/metal-mul-mat-write-opt
gg/metal-mul-mv-new
gg/metal-mul-mv-new-save2
gg/metal-mul-mv-new-save3
gg/metal-mul-mv-opt-2
gg/metal-opt-mul-mat-id
gg/metal-q4_0-opt
gg/metal-refactor-mv-2
gg/metal-reuse-graphs
gg/metal-set-rows-threads
gg/metal-use-virtual-gpu-address
gg/min-p-fix
gg/model-cards
gg/nix-remove-opencl
gg/no-char32_t
gg/ops-update-blas
gg/ot-cpu-repack
gg/pad-kv-cache
gg/per-layer-kv
gg/phi-2-2
gg/phi-2
gg/plamo-test
gg/py-minor-fixes
gg/quantize-fallback
gg/quantum-k-cache
gg/refactor-alibi-2
gg/remove-gqa-check-4657
gg/remove-instruct
gg/remove-k-quants-per-iter
gg/rename-n_ctx
gg/repack-fix-mul-mat-id
gg/repack-fix-wsize
gg/repack-opt-mm-id
gg/repeng
gg/replace-all
gg/rmse_quantization
gg/rpc-fix-misaligned
gg/security-update
gg/server-chunked-prefill
gg/server-debug-win
gg/server-eos-pre-calc
gg/server-fix-ignore-eos
gg/server-fix-infill
gg/server-fix-prompt
gg/server-fix-spec
gg/server-fix-spec-ctx-shift
gg/server-fix-vision-tests
gg/server-infill-empty-prompt-4027
gg/server-infill-end-on-nl
gg/server-logs
gg/server-models-loading
gg/server-reenable-swa-spec
gg/server-test-lru
gg/server-update-js
gg/server-v1-completion
gg/soft-max-ext
gg/speculative-experiments
gg/speculative-fix-oob
gg/speculative-infill
gg/speculative-update
gg/survey-nvidia
gg/swa-fix-kv-shift
gg/swiftui-bench
gg/system-info-llamafile
gg/test-arm
gg/test-bench
gg/test-embd
gg/test-fp16
gg/tests-better-unary-range
gg/tfs-ob1
gg/tmp
gg/tmp-ci
gg/tokenizer-cleanup
gg/try-fix-sycl-iq1_s
gg/ttfb
gg/tts-fix-ubatch
gg/unary-non-cont
gg/unicode-refactor
gg/unified-fix-k-shift
gg/update-phi2-convert
gg/vocab-fix-no-vocab
ggml-backends
ggml-backends-metal
ggml-impl
ggml-quants
gguf
gguf-64bit
gguf-fix-publish
gguf-pip
gguf-publish-ci
gguf-python
gguf-write-single-pass
gguf-write-tensor
graph-profiler
gritlm-pr
hp/tmp/kv-cache-defrag
ik/better_q2_k_s
ik/even_better_iq1s
ik/faster_hellaswag
ik/fix_hellaswag
ik/fix_iq3xxs_metal
ik/fix_k_cache_backend_tests
ik/fix_warnings
ik/ggml-quants-cpp
ik/i-quants-64
ik/imatrix_legacy_quants
ik/iq1_s
ik/iq2_2.31bpw
ik/iq3_s_faster
ik/iq3_s_multiplier
ik/quantize_not_repeating
ik/quantize_with_kv_overrides
ik/test_quantize_fns
ik/try_fix_iq1s_sycl
ik/try_fix_rocm_k_cache
jared/permit-causal-encode
jed/spm-clblast
jg/cuda-fa-np-runtime
jg/gguf-refactor
jg/llama-opt-3
jg/llama-sanitize
kv-cache-opts
llama_server_completions
llama_server_timings
llama-metadata
llama-pull
llama-refactor
llama-refactor-norm
llava-fix-offloading
llm-build-context
llm-reuse-constants
lookahead
lto
master
maxk/sched-prio-updates
metal-cont-bug
metal-fix-norm
metal-improve-batching
metal-soft-max
mixtral
mlx-challenge
mmap
mmap-pages-stats
mul-mat-pad
norm-quants
norm-quants-rebase
opencl-add-mul-mat-f16-f32-image
passkey
patch-1
perf-study
podman
pr_add_intel_amx_support
pr/4484
prepare-PR-of-minicpm-v2.5-gg
q4_0-q4_2-range-fix
q4_1_more_accel_kahan
q4_1_more_accel_loopsplit
q4_1_more_accel
q4_3-range-fix
quant-attn
refactor-mpi
remove-vzip
rev-sampling
revert-5901-fix_set_gpu
revert-7777-host-usm-context-fix
revert-11820-vers_fix
revert-12734-fix_code_in_ggmlsycl
revert-17192-test
revert-pool
rewrite-llama-run-to-be-llama-server-based
rpc-hash-readme
sampling-greedy-with-probs
sampling-refactor
scratch
server-cfg
server-oai-compat
server-parallel
server-rev
shards-lang/gio/visionos-ci
sl/aligned-alloc-no-abort
sl/async-weight-copy
sl/auto-flash-attn
sl/cuda-f16-fix3
sl/cuda-fattn-par-test
sl/cuda-uma
sl/detect-imatrix-nan
sl/dio-test
sl/disable-pp-nkvo
sl/dump-allocs
sl/fix-docker-main-server-build
sl/fix-docker-omp
sl/fix-omp-one-thread
sl/fix-ppl-seq-max
sl/fix-quant-kv-shift
sl/fix-rpc-nkvo
sl/fix-sched-reserve
sl/llama-bench-headers
sl/pr-releases
sl/prepare-next-graph
sl/realloc-error-cont
sl/rpc-backend-cpy
sl/sched-copy-incr-fix
sl/test-mul-mat-backend
sl/zero-max-size
speculative
speculative-grammar
speculative-tree
steering
support_device_reg
support-starcoder-fix
sycl/disable_reorder_opt
sycl_q3s_q1s
sycl-cmake-append
sycl-conv-op
sycl-global-variables
sycl-mul-mat-id
sycl-onednn-convolution
sync-ggml-25-04-03-try-fix
sync-ggml-25-05-01
sync-ggml-25-07-19
sync-ggml-25-07-25
tarek/feat/lfm2-asr-upstream
tcp_server
test-bench
test-mac-os-ci
test-mmv
try-fix-metal
upd-issue-templates
update_sycl_doc
vb/add-smollm3
xd/ops-musa
xsn/ci_legacy_gg
xsn/fix_grammar_1
xsn/ggml_scale_bias
xsn/llama4_scaling_offset
xsn/mtmd_custom_min_max_tokens
xsn/preset_fix_neg_arg
xsn/private_batch_api_pooling_none
xsn/private_batch_api
xsn/qwen3next_experiment
xsn/tmp_jinja_safer
xsn/update_guidelines_for_ai
up.
Vaibhavs10
committed
167 days ago
99619529
up.
Vaibhavs10
committed
171 days ago
97c64a09
Update the graph.
Vaibhavs10
committed
186 days ago
6201b438
fix errors in conversion.
Vaibhavs10
committed
188 days ago
02ff0850
Model -> ModelBase.
Vaibhavs10
committed
188 days ago
32ea9c5f
Init - first pass.
Vaibhavs10
committed
188 days ago
024bd294
common : suggest --jinja when autodetection fails (#14222)
CISC
committed
189 days ago
Verified
e434e691
server : fix incorrect usage of llama_get_embeddings() (#14225)
ggerganov
committed
189 days ago
Verified
89fea80d
llama : add thread safety test (#14035)
slaren
committed
189 days ago
Verified
6adc3c3e
cmake: clean up external project logic for vulkan-shaders-gen (#14179)
bandoti
committed
189 days ago
Verified
0dbcabde
model : add NeoBERT (#14164)
huydt84
committed
189 days ago
Verified
ad590be9
HIP: disable rocwmma on gfx12 by default until rocm 7.0 (#14202)
IMbackK
committed
189 days ago
Verified
7d6d91ba
llama : rework embeddings logic (#14208)
ggerganov
committed
189 days ago
Verified
d3e64b9f
ggml: Add Android support for GGML_CPU_ALL_VARIANTS (#14206)
chaxu01
committed
189 days ago
Verified
3ba0d843
convert : remove arcee change in convert_hf_to_gguf_update.py (#14207)
bartowski1182
committed
189 days ago
Verified
0bf49eb6
gguf-py : allow key override when adding value to GGUFWriter (#14194)
huydt84
committed
189 days ago
Verified
4ad24367
vulkan: mutex around vkQueueSubmit (#14127)
jeffbolznv
committed
189 days ago
Verified
c89c2d1a
ggml-cpu : rework weak alias on apple targets (#14146)
xctan
committed
189 days ago
Verified
3555b300
model : Add support for Arcee AI's upcoming AFM model (#14185)
bartowski1182
committed
189 days ago
Verified
d7da8dc8
server : When listening on a unix domain socket don't print http:// and port (#14180)
ericcurtin
committed
189 days ago
Verified
cd355eda
quantize : change int to unsigned int for KV overrides (#14197)
EAddario
committed
190 days ago
Verified
30e5b01d
CUDA/HIP: fix ssm_scan on devices where warp size is not 32 (#14196)
IMbackK
committed
190 days ago
Verified
e54b3940
HIP: Replace usage of depricated preprocessor macro __AMDGCN_WAVEFRONT_SIZE__ (#14183)
IMbackK
committed
190 days ago
Verified
2c2caa44
kv-cache : fix use-after-move of defrag info (#14189)
ggerganov
committed
190 days ago
Verified
5fce5f94
model : add dots.llm1 architecture support (#14044) (#14118)
Noeda
committed
190 days ago
Verified
9ae4143b
cparams : rename LLAMA_MAX_PARALLEL_SEQUENCES to LLAMA_MAX_SEQ (#14188)
ggerganov
committed
190 days ago
Verified
c311ac66
batch : auto-gen positions + verify multi-sequence input (#14177)
ggerganov
committed
190 days ago
Verified
b9912ac5
docs : remove WIP since PR has been merged (#13912)
pepijndevos
committed
190 days ago
Verified
00ba7726
llama-chat : Do not throw when tool parsing fails (#14012)
Piotr
committed
191 days ago
Verified
3cb203c8
compare-llama-bench: add option to plot (#14169)
am17an
committed
191 days ago
Verified
2e42be42
Older