Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
ggerganov/llama.cpp
Pull Requests
Commits
xsn/tmp_jinja_safer
0cc4m/vulkan-coopmat-amd-windows
0cc4m/vulkan-device-architecture
0cc4m/vulkan-enable-conv2d-on-apple
0cc4m/vulkan-fix-host-memory-max-size
0cc4m/vulkan-fix-mm-tests
0cc4m/vulkan-fprintf-fix
0cc4m/vulkan-instance-cleanup
0cc4m/vulkan-mm-remove-aligned
0cc4m/vulkan-op-opt-step-sgd
0cc4m/vulkan-print-coopmat-shapes
0cc4m/vulkan-renderdoc
0cc4m/vulkan-suballoc-1gb
0cc4m/vulkan-subgroup-size-control-amd
7507-main-intel-dockerfile
SVE-vector-length-agnostic-VLA-gg
add-gemma2-soft-capping
add-pidfile
alloc-assert-fix
apply-3585
assert-restore-abort
auto-color
avoid-gnu-source
batched-bench
build-metal-default
cam-simple-fix
ceb/bert
ceb/bert-tokenizer-fixes
ceb/convert-hf-refactor
ceb/convert-vocab-fallback
ceb/fix-badspecial-silentfail
ceb/fix-cmake-typo
ceb/fix-cuda-warning-flags
ceb/fix-draft-model-default
ceb/fix-logit-check
ceb/fix-msvc-build
ceb/fix-win-unicode-fpaths
ceb/fix-yarn-neox
ceb/libstdcpp-assertions
ceb/nomic-bert
ceb/nomic-vulkan-fix-add
ceb/perf-faster-multigpu
ceb/restore-convert
ceb/wpm-portable-tolower
cedo/add-outetts-v0.3
cedo/fix-q25vl
chunks
ci_cublas
ci/server/fix-slow-test
ci-android
cisc/grok-2
cisc/grovemoe
cisc/test-tokenizers-remote
codeplay/dequant_q4_K_improvements
codeplay/fix-matmul-arith
codeplay/revert-host-alloc
codeplay/sycl-main
codeplay/tg-warmup
compilade/batch-splits
compilade/bitnet-ternary
compilade/convert-hf-refactor
compilade/convert-prequant
compilade/convert-reflinks
compilade/convert-safetensors-parse
compilade/cuda-falcon-h1
compilade/cuda-tq2_0
compilade/faster-lazy-safetensors
compilade/faster-session-sizes
compilade/fix-batch-reserve-rwkv
compilade/fix-command-r
compilade/fix-convert-gemma-1-instruct
compilade/fix-metadata-name-extraction
compilade/fix-mpt-pretok
compilade/fix-output-overflow
compilade/fix-pydantic-example
compilade/fix-qp-iq1-problems
compilade/fix-recurrent-batch-init
compilade/fix-server-long-system-prompt
compilade/fix-server-tests-penalty
compilade/fix-ssm-scan-groups
compilade/gguf-py-dequant
compilade/gguf-py-fix-old-numpy
compilade/gguf-py-fix-q-shape
compilade/gguf-py-mxfp4
compilade/gguf-py-quants-class
compilade/hybrid-kv_unified
compilade/imatrix-batched-chunks
compilade/imatrix-gguf-default
compilade/imatrix-gguf-warning
compilade/imatrix-mxfp4
compilade/imatrix-neutral-prior
compilade/imatrix-saner-3d
compilade/lazier-moe-convert-hf
compilade/lazy-bfloat16-convert-hf
compilade/lazy-convert-hf
compilade/lazy-tuples
compilade/mamba2
compilade/nul-str-token
compilade/optimal-rounding
compilade/output-reorder-lazy-sort
compilade/parallel-convert
compilade/pyright-fix-ignores
compilade/pyright-tests
compilade/q8_0-convert-hf
compilade/readonly-recurrent-inputs
compilade/refactor-kv-cache
compilade/refactor-kv-cache-gg
compilade/refactor-session-files
compilade/requirements-cpu-torch
compilade/superbpe
compilade/test-model-random
compilade/tokenize-example-parse-special
cuda-batched-gemm
cuda-batched-gemm-deq
cuda-cublas-opts
cuda-multi-gpu
cuda-quantum-batch
custom-attention-mask
custom-attention-mask-no-roped-cache
deploy
dequantize-matmul-3-gg
dev
docker-pull-functionality
f16c
fairydreaming/t5-clean-3-gg
fall-back-to-jinja
fix_clblast
fix_cmd_name
fix_ctx_default
fix_q_xxs_mul_mat
fix_sycl_ci
fix-convert-modelname
fix-eos
fix-kv-cache-access
fix-ninja-metallib-build
fix-refact
fix-sessions
fix-tensor-split-zero
flash-attn
flash-attn-cuda
gabe-l-hart/HybridRecurrentCache
gg/add-phi-3-support
gg/add-phixtral
gg/allow-kv-overrides
gg/arm-try-fix-msvc
gg/authors
gg/avoid-mutex
gg/batch-simplify-output
gg/bench-handle-decode-errors
gg/bert-f16
gg/bitnet
gg/bpe-preprocess
gg/build-linux-static
gg/build-pack-lib-include
gg/cache-token-to-piece
gg/cb-naming
gg/check-python-version
gg/ci-add-arm-msvc-toolchain
gg/ci-fix-save-load
gg/ci-loongson
gg/ci-python
gg/ci-rename-job
gg/clang-tidy-disable-bugprone
gg/cmake-dedup-link
gg/compare-change-path
gg/compare-mlx
gg/context-fix-enc-attn-type
gg/context-remove-logits-all
gg/context-sync-upon-output-reorder
gg/convert-fix-byte-tokens
gg/cpu-fix-cpy-iq
gg/cublas-f32
gg/disable-sgemm
gg/encode-pad-equal
gg/fa-req-kq-hs
gg/fix-android
gg/fix-build-gf
gg/fix-cpu-blas
gg/fix-devops
gg/fix-embeddings-wip
gg/fix-fa-q-non-cont
gg/fix-min-max
gg/fix-python-names
gg/fix-spm-added-tokens-dict-4958
gg/fix-starcoder2
gg/fix-vld1q_s8_x4-4872
gg/flash-attn
gg/flash-attn-32x8
gg/flash-attn-a
gg/flash-attn-cuda
gg/flash-attn-interleave-cc
gg/flash-attn-mask-f16
gg/flash-attn-online
gg/flash-attn-rebase
gg/flash-attn-simd
gg/flash-attn-sync
gg/flash-attn-wip
gg/flash-attn-wip2
gg/flash-attn-wip3
gg/flash-attn-wip4
gg/float-pos
gg/ggml_scale
gg/ggml-atomic-int
gg/ggml-cont
gg/ggml-fix-zero-blocks
gg/ggml-rework-cgraph
gg/gguf-fix-null-defer
gg/gguf-py-0.11.0
gg/gpu-prec-tests
gg/grammar-refactor
gg/graph-prec
gg/graph-reuse-reset-fix
gg/hf
gg/hf-args
gg/hf-auto-dl
gg/hf-test
gg/hparams-swa-rope
gg/http-threads
gg/imatrix-gpu-4931
gg/imatrix-remove-assert
gg/indent
gg/infill-better-stop
gg/iq2-refactor-and-tests
gg/kv-cache-prepare-separation
gg/kv-compress
gg/kv-determinism
gg/kv-fix-shift
gg/lfs
gg/llama3-support
gg/llama-add-log
gg/llama-disambiguate
gg/llama-high-throughput-rebase
gg/llama-high-throughput-save
gg/llama-high-throughput-save2
gg/llama-kv-cache
gg/llama-refactor-sampling
gg/llama-reorganize
gg/llama-shadow-on
gg/logits-slowdown
gg/mamba-fix-squeeze
gg/media-add-svg-logo
gg/metal-batched
gg/metal-dequant-align
gg/metal-disable-fa-256
gg/metal-embed
gg/metal-f16
gg/metal-fa-f16-save
gg/metal-fa-f16
gg/metal-fa-vec-bs20
gg/metal-fattn-reqs
gg/metal-fix-build
gg/metal-fix-fa
gg/metal-fix-fa-2
gg/metal-fix-thread-safety
gg/metal-mm-pad
gg/metal-mmid-max-rows
gg/metal-mul-mat-f16
gg/metal-mul-mat-write-opt
gg/metal-mul-mv-new
gg/metal-mul-mv-new-save2
gg/metal-mul-mv-new-save3
gg/metal-opt-mul-mat-id
gg/metal-q4_0-opt
gg/metal-refactor-mv-2
gg/metal-reuse-graphs
gg/min-p-fix
gg/model-avoid-cont3d
gg/model-cards
gg/nix-remove-opencl
gg/no-char32_t
gg/ops-update-blas
gg/ot-cpu-repack
gg/pad-kv-cache
gg/per-layer-kv
gg/phi-2-2
gg/phi-2
gg/plamo-test
gg/py-minor-fixes
gg/quantize-fallback
gg/quantum-k-cache
gg/refactor-alibi-2
gg/remove-gqa-check-4657
gg/remove-instruct
gg/remove-k-quants-per-iter
gg/rename-n_ctx
gg/repack-fix-mul-mat-id
gg/repack-fix-wsize
gg/repack-opt-mm-id
gg/repeng
gg/replace-all
gg/rmse_quantization
gg/rpc-fix-misaligned
gg/server-chunked-prefill
gg/server-debug-win
gg/server-eos-pre-calc
gg/server-fix-ignore-eos
gg/server-fix-infill
gg/server-fix-prompt
gg/server-fix-spec
gg/server-fix-spec-ctx-shift
gg/server-fix-vision-tests
gg/server-infill-empty-prompt-4027
gg/server-infill-end-on-nl
gg/server-logs
gg/server-models-loading
gg/server-reenable-swa-spec
gg/server-test-lru
gg/server-update-js
gg/server-v1-completion
gg/soft-max-ext
gg/speculative-experiments
gg/speculative-fix-oob
gg/speculative-infill
gg/speculative-update
gg/survey-nvidia
gg/swa-fix-kv-shift
gg/swiftui-bench
gg/system-info-llamafile
gg/test-arm
gg/test-bench
gg/test-embd
gg/test-fp16
gg/tfs-ob1
gg/tmp-ci
gg/tokenizer-cleanup
gg/try-fix-sycl-iq1_s
gg/ttfb
gg/tts-fix-ubatch
gg/unary-non-cont
gg/unicode-refactor
gg/unified-fix-k-shift
gg/update-phi2-convert
gg/vocab-fix-no-vocab
ggml-backends
ggml-backends-metal
ggml-impl
ggml-quants
gguf
gguf-64bit
gguf-fix-publish
gguf-pip
gguf-publish-ci
gguf-python
gguf-write-single-pass
gguf-write-tensor
graph-profiler
gritlm-pr
hp/tmp/kv-cache-defrag
ik/better_q2_k_s
ik/even_better_iq1s
ik/faster_hellaswag
ik/fix_hellaswag
ik/fix_iq3xxs_metal
ik/fix_k_cache_backend_tests
ik/fix_warnings
ik/ggml-quants-cpp
ik/i-quants-64
ik/imatrix_legacy_quants
ik/iq1_s
ik/iq2_2.31bpw
ik/iq3_s_faster
ik/iq3_s_multiplier
ik/quantize_not_repeating
ik/quantize_with_kv_overrides
ik/test_quantize_fns
ik/try_fix_iq1s_sycl
ik/try_fix_rocm_k_cache
jared/permit-causal-encode
jed/spm-clblast
jg/cuda-fa-np-runtime
jg/gguf-refactor
jg/llama-opt-3
jg/llama-sanitize
kv-cache-opts
llama_server_completions
llama_server_timings
llama-metadata
llama-refactor
llama-refactor-norm
llava-fix-offloading
llm-build-context
llm-reuse-constants
lookahead
lto
master
maxk/sched-prio-updates
metal-cont-bug
metal-fix-norm
metal-improve-batching
metal-soft-max
mixtral
mlx-challenge
mmap
mmap-pages-stats
mul-mat-pad
norm-quants
norm-quants-rebase
opencl-add-mul-mat-f16-f32-image
passkey
patch-1
perf-study
podman
pr_add_intel_amx_support
pr/4484
prepare-PR-of-minicpm-v2.5-gg
q4_0-q4_2-range-fix
q4_1_more_accel_kahan
q4_1_more_accel_loopsplit
q4_1_more_accel
q4_3-range-fix
quant-attn
refactor-mpi
refactor-server
remove-vzip
rev-sampling
revert-5901-fix_set_gpu
revert-7777-host-usm-context-fix
revert-11820-vers_fix
revert-12734-fix_code_in_ggmlsycl
revert-pool
rpc-hash-readme
sampling-greedy-with-probs
sampling-refactor
scratch
server-cfg
server-oai-compat
server-parallel
server-rev
shards-lang/gio/visionos-ci
sl/aligned-alloc-no-abort
sl/async-weight-copy
sl/auto-flash-attn
sl/cuda-f16-fix3
sl/cuda-fattn-par-test
sl/cuda-uma
sl/detect-imatrix-nan
sl/dio-test
sl/disable-pp-nkvo
sl/dump-allocs
sl/fix-docker-main-server-build
sl/fix-docker-omp
sl/fix-omp-one-thread
sl/fix-ppl-seq-max
sl/fix-quant-kv-shift
sl/fix-rpc-nkvo
sl/fix-sched-reserve
sl/ggml-backend-dev-ids-ext
sl/llama-bench-headers
sl/pr-releases
sl/prepare-next-graph
sl/rpc-backend-cpy
sl/sched-copy-incr-fix
sl/test-mul-mat-backend
sl/zero-max-size
speculative
speculative-grammar
speculative-tree
steering
support_device_reg
support-starcoder-fix
sycl/disable_reorder_opt
sycl_q3s_q1s
sycl-cmake-append
sycl-conv-op
sycl-global-variables
sycl-mul-mat-id
sycl-onednn-convolution
sync-ggml-25-04-03-try-fix
sync-ggml-25-05-01
sync-ggml-25-07-19
sync-ggml-25-07-25
tcp_server
test-bench
test-mac-os-ci
test-mmv
try-fix-metal
upd-issue-templates
update_sycl_doc
vb/add-smollm3
xd/ops-musa
xsn/arg_mmproj_env_var
xsn/ci_legacy_gg
xsn/clarify_reasoning_format
xsn/ggml_scale_bias
xsn/graph_ffn_gate_fix
xsn/hotfix_gpt_oss_template_exception
xsn/private_batch_api_pooling_none
xsn/private_batch_api
xsn/smollm3
xsn/tmp_jinja_safer
safer jinja `llama_chat_templates` struct
ngxson
committed
227 days ago
c9e7cbb0
minja: fix vigogne (https://github.com/google/minja/pull/22)
ochafik
committed
229 days ago
cc503564
Disable jinja test that has a cryptic windows failure
ochafik
committed
229 days ago
e3c475cd
Add missing optional include to server.cpp
ochafik
committed
229 days ago
0e74c9da
Rm unused optional include
ochafik
committed
229 days ago
fc60802b
Fix copy elision warning
ochafik
committed
229 days ago
5074e6fe
Flush stdout in chat template before potential crash
ochafik
committed
229 days ago
33322e82
Forward decl minja::chat_template to avoid eager json dep
ochafik
committed
229 days ago
e63520f3
Normalize newlines in test-chat-templates for windows tests
ochafik
committed
230 days ago
ee1e10e2
Revert LLAMA_CHATML_TEMPLATE refactor
ochafik
committed
230 days ago
d5fa351a
Attempt to fix linkage of LLAMA_CHATML_TEMPLATE
ochafik
committed
230 days ago
81c0d437
Merge remote-tracking branch 'origin/master' into jinja
ochafik
committed
230 days ago
40db7896
Refactor common_chat_* functions to accept minja template + use_jinja option
ochafik
committed
230 days ago
b75d0622
llama.android: add field formatChat to control whether to parse special tokens when send message (#11270)
codezjx
committed
230 days ago
Verified
3edfa7d3
rpc : early register backend devices (#11262)
rgerganov
committed
230 days ago
Verified
667d7284
vocab : fix double-eos check (#11273)
ggerganov
committed
231 days ago
Verified
a133566d
llama : fix deprecation message: vocabable -> vocab (#11269)
dwrensha
committed
231 days ago
Verified
960ec652
README : added kalavai to infrastructure list (#11216)
musoles
committed
231 days ago
Verified
7a689c41
vulkan: support copy from f32 to q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl (#11166)
jeffbolznv
committed
231 days ago
Verified
bd38ddea
vulkan: optimize coopmat2 q4_k/q5_k dequant functions. (#11206)
jeffbolznv
committed
231 days ago
Verified
466300fe
vulkan: optimize coopmat2 q2_k dequant function (#11130)
jeffbolznv
committed
231 days ago
Verified
206bc534
llama : add internlm3 support (#11233)
RunningLeon
committed
231 days ago
Verified
4dbc8b9c
CUDA: backwards pass for misc. ops, add tests (#11257)
JohannesGaessler
committed
231 days ago
Verified
9c8dcefe
llama : add `llama_model_load_from_splits` (#11255)
ngxson
committed
231 days ago
Verified
681149ce
ggml: aarch64: implement SVE kernels for q4_K_q8_K vector dot (#11227)
fj-y-saito
committed
231 days ago
Verified
c67cc983
vulkan: scale caching for k quants + misc fixes (#11081)
netrunnereve
committed
232 days ago
Verified
adc5dd92
ci : use -no-cnv in gguf-split tests (#11254)
ggerganov
committed
232 days ago
Verified
f11cfdfd
fix: ggml: fix vulkan-shaders-gen build (#10448)
sparkleholic
committed
232 days ago
Verified
1d850433
RoPE: fix back, CUDA support for back + noncont. (#11240)
JohannesGaessler
committed
232 days ago
Verified
432df2d5
examples : add embd_to_audio to tts-outetts.py [no ci] (#11235)
danbev
committed
233 days ago
Verified
0ccd7f3e
Older