Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
ggerganov/llama.cpp
Pull Requests
Commits
update_sycl_doc
0cc4m/vulkan-coopmat-amd-windows
0cc4m/vulkan-device-architecture
0cc4m/vulkan-enable-conv2d-on-apple
0cc4m/vulkan-fix-host-memory-max-size
0cc4m/vulkan-fix-mm-tests
0cc4m/vulkan-fprintf-fix
0cc4m/vulkan-instance-cleanup
0cc4m/vulkan-mmq-dp4a-k-quants
0cc4m/vulkan-op-opt-step-sgd
0cc4m/vulkan-print-coopmat-shapes
0cc4m/vulkan-renderdoc
0cc4m/vulkan-suballoc-1gb
0cc4m/vulkan-subgroup-size-control-amd
7507-main-intel-dockerfile
SVE-vector-length-agnostic-VLA-gg
add-gemma2-soft-capping
add-pidfile
alloc-assert-fix
apply-3585
assert-restore-abort
avoid-gnu-source
batched-bench
build-metal-default
cam-simple-fix
ceb/bert
ceb/bert-tokenizer-fixes
ceb/convert-hf-refactor
ceb/convert-vocab-fallback
ceb/fix-badspecial-silentfail
ceb/fix-cmake-typo
ceb/fix-cuda-warning-flags
ceb/fix-draft-model-default
ceb/fix-logit-check
ceb/fix-msvc-build
ceb/fix-win-unicode-fpaths
ceb/fix-yarn-neox
ceb/libstdcpp-assertions
ceb/nomic-bert
ceb/nomic-vulkan-fix-add
ceb/perf-faster-multigpu
ceb/restore-convert
ceb/wpm-portable-tolower
cedo/add-outetts-v0.3
cedo/fix-q25vl
chunks
ci_cublas
ci/server/fix-slow-test
ci-android
cisc/bailingmoe2
cisc/test-tokenizers-remote
codeplay/dequant_q4_K_improvements
codeplay/fix-matmul-arith
codeplay/revert-host-alloc
codeplay/sycl-main
codeplay/tg-warmup
compilade/batch-splits
compilade/bitnet-ternary
compilade/convert-hf-refactor
compilade/convert-prequant
compilade/convert-reflinks
compilade/convert-safetensors-parse
compilade/cuda-falcon-h1
compilade/cuda-tq2_0
compilade/faster-lazy-safetensors
compilade/faster-session-sizes
compilade/fix-batch-reserve-rwkv
compilade/fix-command-r
compilade/fix-convert-gemma-1-instruct
compilade/fix-metadata-name-extraction
compilade/fix-mpt-pretok
compilade/fix-output-overflow
compilade/fix-pydantic-example
compilade/fix-qp-iq1-problems
compilade/fix-recurrent-batch-init
compilade/fix-server-long-system-prompt
compilade/fix-server-tests-penalty
compilade/fix-ssm-scan-groups
compilade/gguf-py-dequant
compilade/gguf-py-fix-old-numpy
compilade/gguf-py-fix-q-shape
compilade/gguf-py-mxfp4
compilade/gguf-py-quants-class
compilade/hybrid-kv_unified
compilade/imatrix-batched-chunks
compilade/imatrix-gguf-default
compilade/imatrix-gguf-warning
compilade/imatrix-mxfp4
compilade/imatrix-neutral-prior
compilade/imatrix-saner-3d
compilade/lazier-moe-convert-hf
compilade/lazy-bfloat16-convert-hf
compilade/lazy-convert-hf
compilade/lazy-tuples
compilade/mamba2
compilade/nul-str-token
compilade/optimal-rounding
compilade/output-reorder-lazy-sort
compilade/parallel-convert
compilade/pyright-fix-ignores
compilade/pyright-tests
compilade/q8_0-convert-hf
compilade/readonly-recurrent-inputs
compilade/refactor-kv-cache
compilade/refactor-kv-cache-gg
compilade/refactor-session-files
compilade/requirements-cpu-torch
compilade/superbpe
compilade/test-model-random
compilade/tokenize-example-parse-special
cuda-batched-gemm
cuda-batched-gemm-deq
cuda-cublas-opts
cuda-multi-gpu
cuda-quantum-batch
custom-attention-mask
custom-attention-mask-no-roped-cache
deploy
dequantize-matmul-3-gg
dev
f16c
fairydreaming/t5-clean-3-gg
fix_clblast
fix_cmd_name
fix_ctx_default
fix_q_xxs_mul_mat
fix_sycl_ci
fix-convert-modelname
fix-eos
fix-kv-cache-access
fix-ninja-metallib-build
fix-refact
fix-sessions
fix-tensor-split-zero
flash-attn
flash-attn-cuda
gabe-l-hart/HybridRecurrentCache
gg/add-phi-3-support
gg/add-phixtral
gg/allow-kv-overrides
gg/arm-try-fix-msvc
gg/authors
gg/avoid-mutex
gg/batch-simplify-output
gg/bench-handle-decode-errors
gg/bert-f16
gg/bitnet
gg/bpe-preprocess
gg/build-linux-static
gg/build-pack-lib-include
gg/cache-token-to-piece
gg/cb-naming
gg/check-python-version
gg/ci-add-arm-msvc-toolchain
gg/ci-fix-save-load
gg/ci-loongson
gg/ci-python
gg/ci-rename-job
gg/clang-tidy-disable-bugprone
gg/cmake-dedup-link
gg/compare-change-path
gg/compare-mlx
gg/context-fix-enc-attn-type
gg/context-remove-logits-all
gg/context-sync-upon-output-reorder
gg/convert-fix-byte-tokens
gg/cpu-fix-cpy-iq
gg/cublas-f32
gg/disable-sgemm
gg/encode-pad-equal
gg/fa-no-kq-pad
gg/fa-req-kq-hs
gg/fix-amx
gg/fix-android
gg/fix-build-gf
gg/fix-cpu-blas
gg/fix-devops
gg/fix-embeddings-wip
gg/fix-fa-q-non-cont
gg/fix-min-max
gg/fix-python-names
gg/fix-spm-added-tokens-dict-4958
gg/fix-starcoder2
gg/fix-sve
gg/fix-vld1q_s8_x4-4872
gg/flash-attn
gg/flash-attn-32x8
gg/flash-attn-a
gg/flash-attn-cuda
gg/flash-attn-interleave-cc
gg/flash-attn-mask-f16
gg/flash-attn-online
gg/flash-attn-rebase
gg/flash-attn-simd
gg/flash-attn-sync
gg/flash-attn-wip
gg/flash-attn-wip2
gg/flash-attn-wip3
gg/flash-attn-wip4
gg/float-pos
gg/ggml_scale
gg/ggml-atomic-int
gg/ggml-cont
gg/ggml-fix-zero-blocks
gg/ggml-rework-cgraph
gg/gguf-fix-null-defer
gg/gguf-py-0.11.0
gg/gpu-prec-tests
gg/grammar-refactor
gg/graph-mamba-reuse
gg/graph-prec
gg/graph-reuse-reset-fix
gg/hf
gg/hf-args
gg/hf-auto-dl
gg/hf-test
gg/hparams-swa-rope
gg/http-threads
gg/imatrix-gpu-4931
gg/imatrix-remove-assert
gg/indent
gg/infill-better-stop
gg/iq2-refactor-and-tests
gg/kv-cache-prepare-separation
gg/kv-compress
gg/kv-determinism
gg/kv-fix-shift
gg/lfs
gg/llama3-support
gg/llama-add-log
gg/llama-disambiguate
gg/llama-high-throughput-rebase
gg/llama-high-throughput-save
gg/llama-high-throughput-save2
gg/llama-kv-cache
gg/llama-refactor-sampling
gg/llama-reorganize
gg/llama-shadow-on
gg/logits-slowdown
gg/mamba-fix-squeeze
gg/media-add-svg-logo
gg/metal-async
gg/metal-async-save-global-queue
gg/metal-batched
gg/metal-dequant-align
gg/metal-disable-fa-256
gg/metal-embed
gg/metal-f16
gg/metal-fa-f16-save
gg/metal-fa-f16
gg/metal-fa-vec-bs20
gg/metal-fattn-reqs
gg/metal-fix-build
gg/metal-fix-fa
gg/metal-fix-fa-2
gg/metal-fix-thread-safety
gg/metal-fuse-add-rms
gg/metal-mm-pad
gg/metal-mmid-max-rows
gg/metal-mul-mat-f16
gg/metal-mul-mat-write-opt
gg/metal-mul-mv-new
gg/metal-mul-mv-new-save2
gg/metal-mul-mv-new-save3
gg/metal-mul-mv-opt-2
gg/metal-opt-mul-mat-id
gg/metal-q4_0-opt
gg/metal-refactor-mv-2
gg/metal-reuse-graphs
gg/metal-use-virtual-gpu-address
gg/min-p-fix
gg/model-cards
gg/nix-remove-opencl
gg/no-char32_t
gg/ops-update-blas
gg/ot-cpu-repack
gg/pad-kv-cache
gg/per-layer-kv
gg/phi-2-2
gg/phi-2
gg/plamo-test
gg/py-minor-fixes
gg/quantize-fallback
gg/quantum-k-cache
gg/refactor-alibi-2
gg/remove-gqa-check-4657
gg/remove-instruct
gg/remove-k-quants-per-iter
gg/rename-n_ctx
gg/repack-fix-mul-mat-id
gg/repack-fix-wsize
gg/repack-opt-mm-id
gg/repeng
gg/replace-all
gg/rmse_quantization
gg/rpc-fix-misaligned
gg/server-chunked-prefill
gg/server-debug-win
gg/server-eos-pre-calc
gg/server-fix-ignore-eos
gg/server-fix-infill
gg/server-fix-prompt
gg/server-fix-spec
gg/server-fix-spec-ctx-shift
gg/server-fix-vision-tests
gg/server-infill-empty-prompt-4027
gg/server-infill-end-on-nl
gg/server-logs
gg/server-models-loading
gg/server-reenable-swa-spec
gg/server-test-lru
gg/server-update-js
gg/server-v1-completion
gg/soft-max-ext
gg/speculative-experiments
gg/speculative-fix-oob
gg/speculative-infill
gg/speculative-update
gg/survey-nvidia
gg/swa-fix-kv-shift
gg/swiftui-bench
gg/system-info-llamafile
gg/test-arm
gg/test-bench
gg/test-embd
gg/test-fp16
gg/tfs-ob1
gg/tmp-ci
gg/tokenizer-cleanup
gg/try-fix-sycl-iq1_s
gg/ttfb
gg/tts-fix-ubatch
gg/unary-non-cont
gg/unicode-refactor
gg/unified-fix-k-shift
gg/update-phi2-convert
gg/vocab-fix-no-vocab
ggml-backends
ggml-backends-metal
ggml-impl
ggml-quants
gguf
gguf-64bit
gguf-fix-publish
gguf-pip
gguf-publish-ci
gguf-python
gguf-write-single-pass
gguf-write-tensor
graph-profiler
gritlm-pr
hp/tmp/kv-cache-defrag
ik/better_q2_k_s
ik/even_better_iq1s
ik/faster_hellaswag
ik/fix_hellaswag
ik/fix_iq3xxs_metal
ik/fix_k_cache_backend_tests
ik/fix_warnings
ik/ggml-quants-cpp
ik/i-quants-64
ik/imatrix_legacy_quants
ik/iq1_s
ik/iq2_2.31bpw
ik/iq3_s_faster
ik/iq3_s_multiplier
ik/quantize_not_repeating
ik/quantize_with_kv_overrides
ik/test_quantize_fns
ik/try_fix_iq1s_sycl
ik/try_fix_rocm_k_cache
jared/permit-causal-encode
jed/spm-clblast
jg/cuda-fa-np-runtime
jg/gguf-refactor
jg/llama-opt-3
jg/llama-sanitize
kv-cache-opts
llama_server_completions
llama_server_timings
llama-metadata
llama-pull
llama-refactor
llama-refactor-norm
llava-fix-offloading
llm-build-context
llm-reuse-constants
lookahead
lto
master
maxk/sched-prio-updates
metal-cont-bug
metal-fix-norm
metal-improve-batching
metal-soft-max
mixtral
mlx-challenge
mmap
mmap-pages-stats
mul-mat-pad
norm-quants
norm-quants-rebase
opencl-add-mul-mat-f16-f32-image
passkey
patch-1
perf-study
podman
pr_add_intel_amx_support
pr/4484
prepare-PR-of-minicpm-v2.5-gg
q4_0-q4_2-range-fix
q4_1_more_accel_kahan
q4_1_more_accel_loopsplit
q4_1_more_accel
q4_3-range-fix
quant-attn
refactor-mpi
remove-vzip
rev-sampling
revert-5901-fix_set_gpu
revert-7777-host-usm-context-fix
revert-11820-vers_fix
revert-12734-fix_code_in_ggmlsycl
revert-pool
rewrite-llama-run-to-be-llama-server-based
rpc-hash-readme
sampling-greedy-with-probs
sampling-refactor
scratch
server-cfg
server-oai-compat
server-parallel
server-rev
shards-lang/gio/visionos-ci
sl/aligned-alloc-no-abort
sl/async-weight-copy
sl/auto-flash-attn
sl/cuda-f16-fix3
sl/cuda-fattn-par-test
sl/cuda-uma
sl/detect-imatrix-nan
sl/dio-test
sl/disable-pp-nkvo
sl/dump-allocs
sl/fix-docker-main-server-build
sl/fix-docker-omp
sl/fix-omp-one-thread
sl/fix-ppl-seq-max
sl/fix-quant-kv-shift
sl/fix-rpc-nkvo
sl/fix-sched-reserve
sl/llama-bench-headers
sl/pr-releases
sl/prepare-next-graph
sl/rpc-backend-cpy
sl/sched-copy-incr-fix
sl/test-mul-mat-backend
sl/zero-max-size
speculative
speculative-grammar
speculative-tree
steering
support_device_reg
support-starcoder-fix
sycl/disable_reorder_opt
sycl_q3s_q1s
sycl-cmake-append
sycl-conv-op
sycl-global-variables
sycl-mul-mat-id
sycl-onednn-convolution
sync-ggml-25-04-03-try-fix
sync-ggml-25-05-01
sync-ggml-25-07-19
sync-ggml-25-07-25
tcp_server
test-bench
test-mac-os-ci
test-mmv
try-fix-metal
upd-issue-templates
update_sycl_doc
vb/add-smollm3
xd/ops-musa
xsn/ci_legacy_gg
xsn/ggml_scale_bias
xsn/private_batch_api_pooling_none
xsn/private_batch_api
xsn/qwen3next_experiment
xsn/tmp_jinja_safer
update guide
Neo Zhang
committed
1 year ago
7764ab91
ggml-backend : fix async copy from CPU (#8897)
slaren
committed
1 year ago
Verified
be55695e
[SYCL] Updated SYCL device filtering (#8901)
OuadiElfarouki
committed
1 year ago
Verified
0478174d
CUDA/HIP: fix tests/test-backend-ops (#8896)
JohannesGaessler
committed
1 year ago
Verified
a8dbc6f7
llama-bench : add support for getting cpu info on Windows (#8824)
kylo5aby
committed
1 year ago
Verified
506122d8
quantize : update usage comment in quantize.cpp (#8889)
danbev
committed
1 year ago
Verified
725e3d94
typo correction (#8891)
Nexesenex
committed
1 year ago
Verified
31958546
server : add lora hotswap endpoint (WIP) (#8857)
ngxson
committed
1 year ago
Verified
1e6f6554
CUDA: fix padding logic for FP16/FP32 (#8884)
JohannesGaessler
committed
1 year ago
Verified
641f5dd2
simple : update name of executable to llama-simple (#8885)
danbev
committed
1 year ago
Verified
5f4dcb1e
cmake : Link vulkan-shaders-gen with pthreads (#8835)
Patater
committed
1 year ago
Verified
db20f50c
[Vulkan] Fix compilation of `vulkan-shaders-gen` on w64devkit after `e31a4f6` (#8880)
MaggotHATE
committed
1 year ago
Verified
efda90c9
contributing : add note about write access
ggerganov
committed
1 year ago
Verified
0bf16de0
ggml : add epsilon as a parameter for group_norm (#8818)
MollySophia
committed
1 year ago
Verified
2d5dd7bb
convert : add support for XLMRoberta embedding models (#8658)
iamlemec
committed
1 year ago
Verified
cdd1889d
[CANN]: Fix ggml_backend_cann_buffer_get_tensor (#8871)
MengqingCao
committed
1 year ago
Verified
c21a8964
[SYCL] correct cmd name (#8877)
arthw
committed
1 year ago
Verified
d4ff8471
common : Changed tuple to struct (TODO fix) (#8823)
Septa2112
committed
1 year ago
Verified
0a4ce786
cann: fix buffer_num and runtime speed slowly error (#8865)
wangshuai09
committed
1 year ago
Verified
bc0f887e
readme : add ramalama to the availables UI (#8811)
ericcurtin
committed
1 year ago
Verified
b42978e7
ggml : fix overflows in elu function (#8866)
jart
committed
1 year ago
Verified
b9dfc25c
py: Add more authorship metadata from model card (#8810)
mofosyne
committed
1 year ago
Verified
1ef14b30
Stop the generation when <|eom_id|> token is encountered - needed for Llama 3.1 tool call support (#8858)
fairydreaming
committed
1 year ago
Verified
d3f0c716
cmake: fix paths for vulkan shaders compilation on Windows (#8573)
stduhpf
committed
1 year ago
Verified
e31a4f67
readme : update model list (#8851)
BarfingLemurs
committed
1 year ago
Verified
400ae6f6
llama : better replace_all (#8852)
ggerganov
committed
1 year ago
Verified
f1ea5146
vulkan : fix Qantized Mat-Vec Mul on AMD GPUs for ncols < 64 (#8855)
0cc4m
committed
1 year ago
Verified
064cdc26
sync : ggml
ggerganov
committed
1 year ago
5587e57a
vulkan : implement Stable Diffusion operators (ggml/904)
0cc4m
committed
1 year ago
a3738b2f
ggml : move c parameter comment to ggml_rope_ext (ggml/901)
danbev
committed
1 year ago
655858ac
Older