Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
ggerganov/llama.cpp
Pull Requests
Commits
codeplay/dequant_q4_K_improvements
0cc4m/fix-vulkan-glm4
0cc4m/vulkan-coopmat-amd-windows
0cc4m/vulkan-device-architecture
0cc4m/vulkan-fix-mm-tests
0cc4m/vulkan-instance-cleanup
0cc4m/vulkan-mm-remove-aligned
0cc4m/vulkan-print-coopmat-shapes
0cc4m/vulkan-renderdoc
0cc4m/vulkan-suballoc-1gb
0cc4m/vulkan-subgroup-size-control-amd
7507-main-intel-dockerfile
SVE-vector-length-agnostic-VLA-gg
add-gemma2-soft-capping
alloc-assert-fix
apply-3585
assert-restore-abort
avoid-gnu-source
batched-bench
build-metal-default
cam-simple-fix
ceb/bert
ceb/bert-tokenizer-fixes
ceb/convert-hf-refactor
ceb/convert-vocab-fallback
ceb/fix-badspecial-silentfail
ceb/fix-cmake-typo
ceb/fix-cuda-warning-flags
ceb/fix-draft-model-default
ceb/fix-logit-check
ceb/fix-msvc-build
ceb/fix-win-unicode-fpaths
ceb/fix-yarn-neox
ceb/libstdcpp-assertions
ceb/nomic-bert
ceb/nomic-vulkan-fix-add
ceb/perf-faster-multigpu
ceb/restore-convert
ceb/wpm-portable-tolower
cedo/add-outetts-v0.3
cedo/fix-q25vl
chunks
ci_cublas
ci/server/fix-slow-test
ci-android
cisc/gguf-array-subtype-support
cisc/jina-embeddings-v3
codeplay/dequant_q4_K_improvements
codeplay/fix-matmul-arith
codeplay/revert-host-alloc
codeplay/sycl-main
codeplay/tg-warmup
compilade/batch-splits
compilade/bitnet-ternary
compilade/convert-hf-refactor
compilade/cuda-tq2_0
compilade/faster-lazy-safetensors
compilade/faster-session-sizes
compilade/fix-command-r
compilade/fix-convert-gemma-1-instruct
compilade/fix-metadata-name-extraction
compilade/fix-mpt-pretok
compilade/fix-pydantic-example
compilade/fix-server-long-system-prompt
compilade/fix-server-tests-penalty
compilade/gguf-py-dequant
compilade/gguf-py-fix-old-numpy
compilade/gguf-py-fix-q-shape
compilade/gguf-py-quants-class
compilade/imatrix-batched-chunks
compilade/lazier-moe-convert-hf
compilade/lazy-bfloat16-convert-hf
compilade/lazy-convert-hf
compilade/lazy-tuples
compilade/mamba2
compilade/nul-str-token
compilade/optimal-rounding
compilade/parallel-convert
compilade/pyright-fix-ignores
compilade/pyright-tests
compilade/q8_0-convert-hf
compilade/refactor-kv-cache
compilade/refactor-kv-cache-gg
compilade/refactor-session-files
compilade/requirements-cpu-torch
compilade/superbpe
compilade/tokenize-example-parse-special
cuda-batched-gemm
cuda-batched-gemm-deq
cuda-cublas-opts
cuda-multi-gpu
cuda-quantum-batch
custom-attention-mask
custom-attention-mask-no-roped-cache
deploy
dequantize-matmul-3-gg
dev
f16c
fairydreaming/t5-clean-3-gg
fall-back-to-jinja
fix_clblast
fix_cmd_name
fix_ctx_default
fix_q_xxs_mul_mat
fix_sycl_ci
fix-convert-modelname
fix-eos
fix-kv-cache-access
fix-ninja-metallib-build
fix-refact
fix-sessions
fix-tensor-split-zero
flash-attn
flash-attn-cuda
gg/add-phi-3-support
gg/add-phixtral
gg/allow-kv-overrides
gg/arm-try-fix-msvc
gg/authors
gg/avoid-mutex
gg/bench-handle-decode-errors
gg/bert-f16
gg/bitnet
gg/bpe-preprocess
gg/build-linux-static
gg/build-pack-lib-include
gg/cache-token-to-piece
gg/cb-naming
gg/check-python-version
gg/ci-add-arm-msvc-toolchain
gg/ci-fix-save-load
gg/ci-loongson
gg/ci-python
gg/ci-rename-job
gg/clang-tidy-disable-bugprone
gg/cmake-dedup-link
gg/compare-change-path
gg/context-fix-enc-attn-type
gg/context-remove-logits-all
gg/convert-fix-byte-tokens
gg/cpu-fix-cpy-iq
gg/cublas-f32
gg/disable-sgemm
gg/enable-cb-default
gg/fa-req-kq-hs
gg/fix-android
gg/fix-cpu-blas
gg/fix-devops
gg/fix-embeddings-wip
gg/fix-min-max
gg/fix-python-names
gg/fix-spm-added-tokens-dict-4958
gg/fix-starcoder2
gg/fix-vld1q_s8_x4-4872
gg/flash-attn
gg/flash-attn-32x8
gg/flash-attn-a
gg/flash-attn-cuda
gg/flash-attn-interleave-cc
gg/flash-attn-mask-f16
gg/flash-attn-online
gg/flash-attn-rebase
gg/flash-attn-simd
gg/flash-attn-sync
gg/flash-attn-wip
gg/flash-attn-wip2
gg/flash-attn-wip3
gg/flash-attn-wip4
gg/float-pos
gg/ggml_scale
gg/ggml-atomic-int
gg/ggml-cont
gg/ggml-fix-zero-blocks
gg/ggml-rework-cgraph
gg/gguf-fix-null-defer
gg/gguf-py-0.11.0
gg/gpu-prec-tests
gg/grammar-refactor
gg/hf
gg/hf-args
gg/hf-auto-dl
gg/hf-test
gg/hparams-swa-rope
gg/http-threads
gg/imatrix-gpu-4931
gg/imatrix-remove-assert
gg/indent
gg/infill-better-stop
gg/iq2-refactor-and-tests
gg/kv-cache-simplify-part2
gg/kv-compress
gg/kv-determinism
gg/lfs
gg/llama3-support
gg/llama-add-log
gg/llama-disambiguate
gg/llama-kv-cache
gg/llama-refactor-sampling
gg/llama-reorganize
gg/llama-shadow-on
gg/logits-slowdown
gg/mamba-fix-squeeze
gg/media-add-svg-logo
gg/metal-batched
gg/metal-dequant-align
gg/metal-disable-fa-256
gg/metal-embed
gg/metal-fa-f16-save
gg/metal-fa-f16
gg/metal-fa-vec-bs20
gg/metal-fattn-reqs
gg/metal-fix-build
gg/metal-fix-fa
gg/metal-fix-fa-2
gg/metal-mm-pad
gg/metal-mmid-max-rows
gg/metal-mul-mat-f16
gg/metal-mul-mat-write-opt
gg/metal-mul-mv-new
gg/metal-mul-mv-new-save2
gg/metal-mul-mv-new-save3
gg/metal-opt-mul-mat-id
gg/metal-q4_0-opt
gg/metal-refactor-mv-2
gg/model-cards
gg/nix-remove-opencl
gg/no-char32_t
gg/pad-kv-cache
gg/per-layer-kv
gg/phi-2-2
gg/phi-2
gg/plamo-test
gg/py-minor-fixes
gg/quantize-fallback
gg/quantum-k-cache
gg/refactor-alibi-2
gg/remove-gqa-check-4657
gg/remove-instruct
gg/remove-k-quants-per-iter
gg/rename-n_ctx
gg/repack-fix-mul-mat-id
gg/repeng
gg/replace-all
gg/rmse_quantization
gg/rpc-fix-misaligned
gg/server-chunked-prefill
gg/server-debug-win
gg/server-fix-infill
gg/server-fix-prompt
gg/server-fix-spec
gg/server-fix-spec-ctx-shift
gg/server-infill-empty-prompt-4027
gg/server-infill-end-on-nl
gg/server-logs
gg/server-models-loading
gg/server-update-js
gg/server-v1-completion
gg/soft-max-ext
gg/speculative-experiments
gg/speculative-fix-oob
gg/speculative-infill
gg/speculative-update
gg/survey-nvidia
gg/swa-fix-kv-shift
gg/swiftui-bench
gg/system-info-llamafile
gg/test-arm
gg/test-bench
gg/test-embd
gg/test-fp16
gg/tfs-ob1
gg/tmp-ci
gg/tokenizer-cleanup
gg/try-fix-sycl-iq1_s
gg/ttfb
gg/tts-fix-ubatch
gg/unary-non-cont
gg/unicode-refactor
gg/update-phi2-convert
gg/vocab-fix-no-vocab
ggml-backends
ggml-backends-metal
ggml-impl
ggml-quants
gguf
gguf-64bit
gguf-fix-publish
gguf-pip
gguf-publish-ci
gguf-python
gguf-write-single-pass
gguf-write-tensor
graph-profiler
gritlm-pr
hp/tmp/kv-cache-defrag
ik/better_q2_k_s
ik/even_better_iq1s
ik/faster_hellaswag
ik/fix_hellaswag
ik/fix_iq3xxs_metal
ik/fix_k_cache_backend_tests
ik/fix_warnings
ik/ggml-quants-cpp
ik/i-quants-64
ik/imatrix_legacy_quants
ik/iq1_s
ik/iq2_2.31bpw
ik/iq3_s_faster
ik/iq3_s_multiplier
ik/quantize_not_repeating
ik/quantize_with_kv_overrides
ik/test_quantize_fns
ik/try_fix_iq1s_sycl
ik/try_fix_rocm_k_cache
jared/permit-causal-encode
jed/spm-clblast
jg/cuda-fa-np-runtime
jg/gguf-refactor
jg/llama-opt-3
jg/llama-sanitize
kv-cache-opts
llama_server_completions
llama_server_timings
llama-metadata
llama-refactor
llama-refactor-norm
llava-fix-offloading
llm-build-context
llm-reuse-constants
lookahead
lto
master
maxk/sched-prio-updates
metal-cont-bug
metal-fix-norm
metal-improve-batching
metal-soft-max
mixtral
mlx-challenge
mmap
mmap-pages-stats
mul-mat-pad
norm-quants
norm-quants-rebase
passkey
patch-1
perf-study
podman
pr_add_intel_amx_support
pr/4484
prepare-PR-of-minicpm-v2.5-gg
q4_0-q4_2-range-fix
q4_1_more_accel_kahan
q4_1_more_accel_loopsplit
q4_1_more_accel
q4_3-range-fix
quant-attn
refactor-mpi
refactor-server
remove-vzip
rev-sampling
revert-5901-fix_set_gpu
revert-7777-host-usm-context-fix
revert-11820-vers_fix
revert-12734-fix_code_in_ggmlsycl
revert-pool
rpc-hash-readme
sampling-greedy-with-probs
sampling-refactor
scratch
server-cfg
server-oai-compat
server-parallel
server-rev
shards-lang/gio/visionos-ci
sl/aligned-alloc-no-abort
sl/async-weight-copy
sl/auto-flash-attn
sl/cuda-f16-fix3
sl/cuda-fattn-par-test
sl/cuda-uma
sl/detect-imatrix-nan
sl/dio-test
sl/disable-pp-nkvo
sl/dump-allocs
sl/fix-docker-main-server-build
sl/fix-docker-omp
sl/fix-omp-one-thread
sl/fix-ppl-seq-max
sl/fix-quant-kv-shift
sl/fix-rpc-nkvo
sl/fix-sched-reserve
sl/llama-bench-headers
sl/more-imatrix-nan-fixes
sl/pr-releases
sl/prepare-next-graph
sl/rpc-backend-cpy
sl/test-mul-mat-backend
sl/zero-max-size
speculative
speculative-grammar
speculative-tree
steering
support_device_reg
support-starcoder-fix
sycl/disable_reorder_opt
sycl/non_cont_norms
sycl_q3s_q1s
sycl-cmake-append
sycl-conv-op
sycl-global-variables
sycl-mul-mat-id
sycl-onednn-convolution
sync-ggml-25-04-03-try-fix
sync-ggml-25-05-01
tcp_server
test-bench
test-mac-os-ci
test-mmv
try-fix-metal
upd-issue-templates
update_sycl_doc
xsn/arg_mmproj_env_var
xsn/ci_legacy_gg
xsn/graph_ffn_gate_fix
xsn/private_batch_api_pooling_none
xsn/private_batch_api
xsn/tmp_jinja_safer
Vectorize q load
Aidan
committed
339 days ago
a235b7c5
Store scales in local mem
Aidan
committed
339 days ago
604ef6bf
Single load for half2
Aidan
committed
339 days ago
cb3fb420
Remove double lines
Aidan
committed
339 days ago
4a481556
Merge pull request #7920 from ggerganov/codeplay/revert-host-alloc
joeatodd
committed
342 days ago
Verified
ff076b88
Merge pull request #7919 from ggerganov/codeplay/unify-rope-sycl
joeatodd
committed
342 days ago
Verified
b2c8c831
Replace powf with sycl::pow in ggml-sycl.cpp
joeatodd
committed
342 days ago
ded54b5d
Revert "use the correct SYCL context for host USM allocations"
joeatodd
committed
343 days ago
18133cab
Formatting
joeatodd
committed
343 days ago
abd7c7b8
[SYCL] Update unsupported ops
joeatodd
committed
343 days ago
0c0f3f00
[SYCL] unify rope norm/neox
joeatodd
committed
343 days ago
9b81b572
tests : add non-cont unary tests (#7857)
ggerganov
committed
344 days ago
Verified
a9cae480
ggml : improve ggml_is_contiguous logic (#7856)
ggerganov
committed
344 days ago
Verified
bfaa676b
server : restore numeric prompts (#7883)
ggerganov
committed
344 days ago
Verified
704a35b1
update intel docker oneapi-basekit to 2024.1.1-devel-ubuntu22.04 (#7894)
airMeng
committed
344 days ago
Verified
dcf75270
Fix a typo and add Fedora 40 pacakge to install for Vulkan (#7794) [no ci]
metal3d
committed
345 days ago
Verified
f2b5764b
vulkan: select only one device for single gpu with multiple drivers (#7582)
Adriankhl
committed
345 days ago
Verified
73bac2b1
Update Vulkan RoPE implementation (#7818)
0cc4m
committed
345 days ago
Verified
ef52d1d1
fix broken link in pr template (#7880) [no ci]
deven367
committed
345 days ago
Verified
14f83526
github: move PR template to .github/ root (#7868)
mofosyne
committed
345 days ago
Verified
6fe42d07
llama-bench: more compact markdown tables (#7879)
JohannesGaessler
committed
345 days ago
Verified
148995e5
tests : check the Python version (#7872)
ggerganov
committed
345 days ago
Verified
4bfe50f7
CUDA: int8 tensor cores for MMQ (q4_K, q5_K, q6_K) (#7860)
JohannesGaessler
committed
345 days ago
Verified
bdcb8f42
fix CUDA CI by using a windows-2019 image (#7861)
slaren
committed
345 days ago
Verified
c2ce6c47
json: refine constraint for whitespace to avoid runaways yet allow pretty print (#7866)
ochafik
committed
346 days ago
Verified
b61eb964
`json`: document schema conversion in GBNF readme, align manual grammar examples & converters (#7841)
ochafik
committed
346 days ago
Verified
396b18df
cmake : fix CMake requirement for CUDA (#7821)
cebtenzzre
committed
346 days ago
Verified
864a99e7
ci : try win-2019 on server windows test (#7854)
slaren
committed
346 days ago
Verified
fd5ea0f8
examples : remove --instruct remnants (#7846)
ggerganov
committed
346 days ago
Verified
c28a8390
server : improve "prompt" handling (#7847)
ggerganov
committed
346 days ago
Verified
d9da0e49
Older