Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
ngxson/llama.cpp
Pull Requests
Commits
xsn/docker_no_build_test
debug_server_pref
gemma3n_mtmd
hp/split/load-model
master
poc/vision
tmp0
wsn/server_health_non_blocking
xsn/a11y
xsn/accept_pdf
xsn/add_n_support
xsn/arch_refactor_llm_names
xsn/arg_add_catalog
xsn/arg_better_handle_hf_mmproj
xsn/arg_cpp
xsn/arg_ctk_ctv
xsn/arg_missing_ifdef
xsn/arg_mm
xsn/arg_neg_fix
xsn/arg_neg
xsn/arg_unused_var
xsn/argparser_v3
xsn/better_error
xsn/better_server_json_value
xsn/bug_report_add_cmd
xsn/bump_transformers
xsn/cache_missing_slash
xsn/cache_model_list
xsn/cancellable_request
xsn/chat_apply_template
xsn/chat_cli
xsn/chat_int_overflow
xsn/chat_template_prefix_postfix
xsn/chat_tmpl_alias
xsn/chat_tmpl_enumerate
xsn/check_vendor_ci
xsn/ci_cpu_ubuntu_20
xsn/ci_docker_no_fast_fail
xsn/ci_fix_arm64
xsn/ci_fix_arm64_2
xsn/ci_ggml_org_hosted
xsn/ci-permission
xsn/clean_up_server
xsn/cleanup_oai
xsn/cli_arrow_left_right
xsn/cli_auto_cnv
xsn/cli_command
xsn/cli_jinja_default
xsn/cli_move_warning
xsn/cli_server_based
xsn/clip_ffn_up_down_fix
xsn/clip_fix_model_size_display
xsn/clip_gpu
xsn/clip_improve_concat
xsn/clip_no_mmproj_offload
xsn/clip_no_print_ftype
xsn/clip_preprocessing_refactor
xsn/clip_proj_naming
xsn/clip_refactor_img_manip
xsn/clip_refactor_set_input
xsn/clip_refactor_smaller_files
xsn/clip_smart_ptr
xsn/codeowners
xsn/codeowners2
xsn/common_cpp_no_json
xsn/common_remote_get_content
xsn/compare_logits
xsn/control-vector-generator
xsn/control-vector-multiprompt
xsn/convert_fix_llama4_clash
xsn/convert_gguf_qwen2vl
xsn/convert_improve_arch_handling
xsn/convert_kimi_k2_quant_repack
xsn/convert_kimi_k2_quant
xsn/convert_mmproj_type_mean_std
xsn/convert_mmproj
xsn/convert_update_qol
xsn/correct_llama2_template
xsn/create_server_context
xsn/csm_tts_batched_decode
xsn/csm_tts
xsn/curl_ci_test
xsn/curl_on_by_default
xsn/curl_static
xsn/custom_swa_list
xsn/cvector_fix_pca
xsn/cvector-better-prompt
xsn/cvector-fix
xsn/deepseek_r1_qwen
xsn/deepseek-ocr
xsn/defer-server-task
xsn/devstral2_convert
xsn/disallow_remote_code_convert
xsn/docker_no_build_test
xsn/docs-sycl-vulkan
xsn/dotsllm1
xsn/dotsocr
xsn/download_cpp
xsn/duplicated_tensor_name
xsn/embedding_input
xsn/emscripten_webgpu
xsn/env_var_speculative
xsn/exaone_tied_embd
xsn/exceed_context_size_error
xsn/fix_audio_patch_size_zero
xsn/fix_chat_tmpl
xsn/fix_ci_test
xsn/fix_ci
xsn/fix_console_backspace
xsn/fix_curl_old_ver
xsn/fix_docker_ci
xsn/fix_empty_batch
xsn/fix_emscripten_build
xsn/fix_export_lora_2
xsn/fix_gemma2_tokenizer
xsn/fix_gemma3n_conversion
xsn/fix_get_weights
xsn/fix_imatrix_arg
xsn/fix_kimi_k2_tmpl
xsn/fix_kv_shift_qwen2vl
xsn/fix_llam4_conversion
xsn/fix_llama_api_missing
xsn/fix_llama_lora
xsn/fix_logprobs
xsn/fix_lora_convert
xsn/fix_lora_merge
xsn/fix_lora_merge_2
xsn/fix_lora
xsn/fix_main_cnv_tmpl
xsn/fix_metal_im2col
xsn/fix_mistral_chat_format
xsn/fix_order_unary_ops
xsn/fix_qwen_omni_conv
xsn/fix_qwen3_nb
xsn/fix_res_error
xsn/fix_server_chat_template
xsn/fix_server_test_exit
xsn/fix_slow_ci
xsn/fix_sys_prompt
xsn/fix_test_timeout
xsn/fix_uhd_preprocessing
xsn/fix_ui_copy_function
xsn/fix_unsupported_chat_tmpl
xsn/fix_url_mismatch
xsn/fix-async-iterator-safari
xsn/fix-fattn-qwen25vl
xsn/fix-mrope
xsn/fix-mrope-asan-error
xsn/fix-mrope-causal
xsn/fix-server-task-lock
xsn/flash_attn_lora
xsn/full_image_less
xsn/gelu_erf_cu
xsn/gelu_na
xsn/gemma_template
xsn/gemma2_mask_swa
xsn/gemma3_lm_head
xsn/gemma3n_audio
xsn/gemma3n
xsn/gemma-multiple-system-role
xsn/ggml_cast_f32_i32
xsn/ggml_fill
xsn/ggml_repeat_4d
xsn/ggml_scale_bias
xsn/gguf_cpp_wrapper
xsn/gguf-split-size
xsn/glm4v
xsn/gptoss_non_mxfp4_conversion
xsn/helium_test
xsn/hf_offline
xsn/hf_repo_hf_file_duplicate_name
xsn/hf_repo
xsn/homecook-mistral-o
xsn/httplib_cpp_h
xsn/httplib_0_19_0
xsn/hunyuan-moe
xsn/idefics3-fix-preproc
xsn/improve_common_log
xsn/improve_server_ui
xsn/improve_server_works
xsn/intel-oneapi
xsn/internvl
xsn/janus_pro
xsn/kimi-vl
xsn/lazy_remote_tensor
xsn/lfm2_vl
xsn/lighton-ocr
xsn/llama_batch_remove_compat
xsn/llama_chat_tmpl_docs
xsn/llama_cpp_lib
xsn/llama_decode_enum
xsn/llama_lora_adapter_clear
xsn/llama_model_load_from_splits_cli
xsn/llama_model_load_from_splits
xsn/llama_set_attn_type_backup
xsn/llama_set_attn_type
xsn/llama4causal
xsn/llama4causalfix
xsn/llama4_mapping
xsn/llama4_rms_norm
xsn/llama4_scaling
xsn/llama4
xsn/llamax-demo
xsn/llava2
xsn/load_from_buffer
xsn/local_media_path
xsn/lora_convert_base_is_optional
xsn/lora_new_tokens_warn
xsn/lora_per_request
xsn/lora_server_hotswap
xsn/main_chat_template
xsn/main_chat_template_2
xsn/main_tmpl_preserve_nl
xsn/makefile_missing
xsn/master_test_decode_count
xsn/memleak_mtmd_helper
xsn/merge_llava_to_mtmd_cli
xsn/mergekit_extract_lora_compat
xsn/mimi_dec
xsn/minicpm_template2
xsn/minicpm-template
xsn/minicpmv_cli_fix
xsn/minicpmv-improve-sincos-embd
xsn/ministral3_quantized
xsn/ministral3
xsn/minor_fix_ui
xsn/missing-args
xsn/mistral_large_moe
xsn/mistral_large_scaling
xsn/mistral_small_vision
xsn/mistral_small
xsn/model_merge_with_embd
xsn/model_merge
xsn/more_try_catch_server
xsn/move_llava_to_mtmd
xsn/mrope_metal
xsn/mrope_normal_pos_text
xsn/mtmd_better_init_struct
xsn/mtmd_c_api
xsn/mtmd_cleanup_n_patches
xsn/mtmd_clip_private
xsn/mtmd_docs
xsn/mtmd_fix_batch_view_mrope
xsn/mtmd_fix_no_warmup
xsn/mtmd_fix_pub_header
xsn/mtmd_glmedge_rm_boi_eoi
xsn/mtmd_graph_builder_refactor
xsn/mtmd_helper_dedicated_file
xsn/mtmd_helper_dedicated_lib
xsn/mtmd_image_api
xsn/mtmd_improve_0
xsn/mtmd_llama4_new
xsn/mtmd_no_internal
xsn/mtmd_optimize_2d_rope
xsn/mtmd_pixtral
xsn/mtmd_qwen2vl_reduce_img_size
xsn/mtmd_qwen2vl
xsn/mtmd_refactor_audio_preproc
xsn/mtmd_remove_legacy
xsn/mtmd_rm_glm_eoi_boi
xsn/mtmd_set_log
xsn/mtmd_smolvlm
xsn/mtmd_ultravox
xsn/mtmd_warmup_bool
xsn/mtmd-cli-jinja
xsn/mtmd-initial-video-api
xsn/mtmd-max-min-pixels
xsn/need_insert_eot
xsn/nemotron-chat-template
xsn/nits_smollm3
xsn/no_curl_ggml_ci
xsn/no_n_predict_minus_2
xsn/no-warmup-arg
xsn/norway_problem
xsn/oai_add_system_fingerprint
xsn/oai_completions
xsn/oneoff_fix_mistral_tmpl
xsn/orion_chat_tmpl
xsn/paddleocr
xsn/phi3-convert
xsn/phi4_tmpl
xsn/phi-3-default-swa
xsn/phi-4-mm
xsn/pin_ci
xsn/pixtral_fix_backend
xsn/poc_cli_server_based
xsn/poc_interim_server
xsn/poc_proxy_router
xsn/poc_proxy_2
xsn/poc_proxy_3
xsn/private_batch_api
xsn/python_quantize_k
xsn/quantize_mtmd
xsn/qwen_allow_large_img_default
xsn/qwen_embd_pooling
xsn/qwen_vl_max_res
xsn/qwen2audio
xsn/qwen2vl_fix_text_pos
xsn/qwen3_embd_rerank
xsn/qwen25omni
xsn/readme_deps
xsn/redo_quant_threads
xsn/reduce_compile_time_arg
xsn/refactor_clip
xsn/refactor_cpu_dup_op
xsn/refactor_download
xsn/refactor_server_multitask_test
xsn/refactor_server_multitask
xsn/refactor_server_slot_input
xsn/refactor_server_struct_input
xsn/refactor_server_struct_type
xsn/remove_train_fintune
xsn/renaming_server
xsn/reorganize_docs
xsn/rerank_tei_format
xsn/revert_rm_boi_eoi
xsn/revert_rm_timings
xsn/rework_get_started_docs
xsn/rm_inp_one
xsn/rope_v2
xsn/router_no_content_length
xsn/server_audio
xsn/server_bench_docker
xsn/server_chat_cmpl_model
xsn/server_chat_template_detect
xsn/server_chat_template
xsn/server_clarify_kvu_np
xsn/server_clarify_slots
xsn/server_connection_is_alive
xsn/server_custom_tmpl
xsn/server_dev_docs
xsn/server_echo_logprobs_stream
xsn/server_embd_multitask
xsn/server_empty_prompt
xsn/server_explicit_access
xsn/server_explicit_exec_path
xsn/server_fix_stream_cancel
xsn/server_functionary
xsn/server_improve_msg_diff
xsn/server_improve_spec
xsn/server_jinja_enabled_default
xsn/server_lightweight_chat_ui
xsn/server_missing_model_id
xsn/server_model_management_v1_2
xsn/server_more_args
xsn/server_more_tests
xsn/server_mtmd
xsn/server_no_cache_bug
xsn/server_no_err_out_of_ctx
xsn/server_node_22_11_0
xsn/server_params_2
xsn/server_preset_common_section
xsn/server_pytest
xsn/server_refactor_split_task_common
xsn/server_remove_gpt_3_name
xsn/server_res_error_ok_static
xsn/server_response_generator_refactor
xsn/server_separate_pos_tokens
xsn/server_std_move
xsn/server_sync_docs
xsn/server_task_create_state
xsn/server_thread_join_stop
xsn/server_tighten_cancel
xsn/server_tts_streamed
xsn/server_tts
xsn/server_twice_ctrl_c
xsn/server_ui_tok_per_sec
xsn/server-bring-back-stream-final-chunk
xsn/server-cleaup-oai-logic
xsn/server-fix-infill-format
xsn/server-lib-version-bump
xsn/server-mistral-template
xsn/slot_state_machine_segv
xsn/slot_state_machine
xsn/smollm3_fix_jinja_tmpl
xsn/speed_up_compilation
xsn/split_http_server_context
xsn/split_without_tensor
xsn/tag_based_hf_repo
xsn/temp_fix_httplib
xsn/test_docker_arm
xsn/test_pixtral_fixed_size
xsn/this_tts_test
xsn/tool_call
xsn/typo_gml_glm
xsn/ui_copy_btn
xsn/ultravox
xsn/update_main_docs
xsn/use_repeat_4d
xsn/vision
xsn/vision_2
xsn/voxtral
xsn/wasm_simd
xsn/webui_conv_branching
xsn/webui_fix_numeric_settings
xsn/webui_m_q_params
xsn/webui_max_file_size
xsn/webui_modalities
xsn/webui_pako
xsn/webui_pyodide
xsn/webui_reactjs
xsn/webui_rework_input
xsn/webui_small_misalignment
xsn/win_curl_static
xsn/wllama
xsn/xiaomi_mimo
include "ggml-cpu.h"
ngxson
committed
230 days ago
c37252b1
docker : do not build tests
ngxson
committed
230 days ago
991958a0
rpc : fix cache directory initialization (#13188)
hbuxiaofei
committed
230 days ago
Verified
a0f7016d
scripts: n_depth for compare-llama-bench [no ci] (#13201)
JohannesGaessler
committed
230 days ago
Verified
19e899ce
server : Prefilling assistant message in openai compatible API (#13174)
matteoserva
committed
230 days ago
Verified
e2e1ddb9
sampling : when top-k <= 0 -> noop (#13173)
ggerganov
committed
230 days ago
Verified
d9d398f8
llama-bench: fixed size of fields to correctly map to values (#13183)
Alberto Cabrera Pérez
committed
230 days ago
Verified
5a639801
CUDA: fix non-cont. inputs for batched mat mul (#13155)
JohannesGaessler
committed
230 days ago
Verified
cdf76586
llama : llm_type order by size (#13177)
CISC
committed
231 days ago
Verified
7d3af70b
mtmd : add qwen2vl and qwen2.5vl (#13141)
ngxson
committed
231 days ago
Verified
00e3e5a1
llama : set qwen3 model type sizes (#13175)
CISC
committed
231 days ago
Verified
e98b3692
llama-graph : fix text position for mrope (#13159)
ngxson
committed
231 days ago
Verified
b6ce7430
model : Nomic Embed Text V2 with Mixture-of-Experts (MoE) architecture (#12466)
manyoso
committed
231 days ago
Verified
5f5e39e1
clip : fix model size display (#13153)
ngxson
committed
231 days ago
Verified
eaea3253
fix(rpc): Improve input validation and error handling (#13069)
thevilledev
committed
231 days ago
Verified
43ddab6e
llama-bench: add `-d` depth arg (#13096)
thevishalagarwal
committed
231 days ago
Verified
1831f538
mtmd : fix glm-edge redundant token count (#13139)
ngxson
committed
231 days ago
Verified
4e87962e
context : do not clear output buffer on reserve (#13152)
pockers21
committed
231 days ago
Verified
fb0471d1
llama : (mrope) allow using normal 1D position for text token (#13138)
ngxson
committed
231 days ago
Verified
d2b2031e
clip : refactor set input for cgraph + fix qwen2.5vl input (#13136)
ngxson
committed
232 days ago
Verified
5fa9e63b
SYCL: Add all missing unary kernels (#13074)
qnixsynapse
committed
232 days ago
Verified
a4c340f9
readme : update hot topics (#13150)
ggerganov
committed
232 days ago
Verified
d0a417f3
common : fix noreturn compile warning (#13151)
ggerganov
committed
232 days ago
Verified
43f2b071
llama-chat : fix typo GML --> GLM (#13143)
ngxson
committed
232 days ago
Verified
e5d6c255
musa: fix typo in cc control (#13144)
yeahdongcn
committed
232 days ago
Verified
f0dd6a19
CUDA: fix q_nope_absorbed prec for DS 2 Lite f16 (#13137)
JohannesGaessler
committed
232 days ago
Verified
69699be4
arg : fix unused variable (#13142)
ngxson
committed
232 days ago
Verified
85f36e5e
llama-bench : Add `--override-tensors` arg (#12922)
4onen
committed
232 days ago
Verified
c0a97b76
llama-chat : fix wrong template in GLM4-0414 (#13140)
matteoserva
committed
232 days ago
Verified
ced44be3
musa: fix build warning (#13129)
yeahdongcn
committed
233 days ago
Verified
e291450b
Older