Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
ngxson/llama.cpp
Pull Requests
Commits
xsn/voxtral
debug_server_pref
gemma3n_mtmd
hp/split/load-model
master
poc/vision
tmp0
wsn/server_health_non_blocking
xsn/a11y
xsn/accept_pdf
xsn/add_n_support
xsn/agents_md
xsn/arch_refactor_llm_names
xsn/arg_add_catalog
xsn/arg_better_handle_hf_mmproj
xsn/arg_cpp
xsn/arg_ctk_ctv
xsn/arg_missing_ifdef
xsn/arg_mm
xsn/arg_neg_fix
xsn/arg_neg
xsn/arg_unused_var
xsn/argparser_v3
xsn/asan_arg_smpl
xsn/better_error
xsn/better_server_json_value
xsn/bug_report_add_cmd
xsn/bump_transformers
xsn/cache_missing_slash
xsn/cache_model_list
xsn/cancellable_request
xsn/chat_apply_template
xsn/chat_cli
xsn/chat_int_overflow
xsn/chat_template_prefix_postfix
xsn/chat_tmpl_alias
xsn/chat_tmpl_enumerate
xsn/check_vendor_ci
xsn/ci_cpu_ubuntu_20
xsn/ci_docker_no_fast_fail
xsn/ci_fix_arm64
xsn/ci_fix_arm64_2
xsn/ci_ggml_org_hosted
xsn/ci-permission
xsn/clean_up_server
xsn/cleanup_oai
xsn/cli_arrow_left_right
xsn/cli_auto_cnv
xsn/cli_buffered_logs
xsn/cli_command
xsn/cli_jinja_default
xsn/cli_move_warning
xsn/cli_server_based
xsn/clip_ffn_up_down_fix
xsn/clip_fix_model_size_display
xsn/clip_gpu
xsn/clip_improve_concat
xsn/clip_no_mmproj_offload
xsn/clip_no_print_ftype
xsn/clip_preprocessing_refactor
xsn/clip_proj_naming
xsn/clip_refactor_img_manip
xsn/clip_refactor_set_input
xsn/clip_refactor_smaller_files
xsn/clip_smart_ptr
xsn/codeowners
xsn/codeowners2
xsn/common_cpp_no_json
xsn/common_remote_get_content
xsn/compare_logits
xsn/control-vector-generator
xsn/control-vector-multiprompt
xsn/convert_fix_llama4_clash
xsn/convert_gguf_qwen2vl
xsn/convert_improve_arch_handling
xsn/convert_kimi_k2_quant_repack
xsn/convert_kimi_k2_quant
xsn/convert_mmproj_type_mean_std
xsn/convert_mmproj
xsn/convert_update_qol
xsn/correct_llama2_template
xsn/create_server_context
xsn/csm_tts_batched_decode
xsn/csm_tts
xsn/curl_ci_test
xsn/curl_on_by_default
xsn/curl_static
xsn/custom_swa_list
xsn/cvector_fix_pca
xsn/cvector-better-prompt
xsn/cvector-fix
xsn/deepseek_r1_qwen
xsn/deepseek-ocr
xsn/defer-server-task
xsn/devstral2_convert
xsn/disallow_remote_code_convert
xsn/docker_no_build_test
xsn/docs-sycl-vulkan
xsn/dotsllm1
xsn/dotsocr
xsn/download_cpp
xsn/duplicated_tensor_name
xsn/embedding_input
xsn/emscripten_webgpu
xsn/env_var_speculative
xsn/exaone_tied_embd
xsn/exceed_context_size_error
xsn/fix_audio_patch_size_zero
xsn/fix_chat_tmpl
xsn/fix_ci_test
xsn/fix_ci
xsn/fix_console_backspace
xsn/fix_curl_old_ver
xsn/fix_docker_ci
xsn/fix_empty_batch
xsn/fix_emscripten_build
xsn/fix_export_lora_2
xsn/fix_gemma2_tokenizer
xsn/fix_gemma3n_conversion
xsn/fix_get_weights
xsn/fix_imatrix_arg
xsn/fix_kimi_k2_tmpl
xsn/fix_kv_shift_qwen2vl
xsn/fix_llam4_conversion
xsn/fix_llama_api_missing
xsn/fix_llama_lora
xsn/fix_logprobs
xsn/fix_lora_convert
xsn/fix_lora_merge
xsn/fix_lora_merge_2
xsn/fix_lora
xsn/fix_main_cnv_tmpl
xsn/fix_metal_im2col
xsn/fix_mistral_chat_format
xsn/fix_order_unary_ops
xsn/fix_qwen_omni_conv
xsn/fix_qwen3_nb
xsn/fix_res_error
xsn/fix_router_ssl
xsn/fix_server_chat_template
xsn/fix_server_test_exit
xsn/fix_slow_ci
xsn/fix_sys_prompt
xsn/fix_test_timeout
xsn/fix_uhd_preprocessing
xsn/fix_ui_copy_function
xsn/fix_unsupported_chat_tmpl
xsn/fix_url_mismatch
xsn/fix-async-iterator-safari
xsn/fix-fattn-qwen25vl
xsn/fix-mrope
xsn/fix-mrope-asan-error
xsn/fix-mrope-causal
xsn/fix-server-task-lock
xsn/flash_attn_lora
xsn/full_image_less
xsn/gelu_erf_cu
xsn/gelu_na
xsn/gemma_template
xsn/gemma2_mask_swa
xsn/gemma3_lm_head
xsn/gemma3n_audio
xsn/gemma3n
xsn/gemma-multiple-system-role
xsn/ggml_cast_f32_i32
xsn/ggml_fill
xsn/ggml_repeat_4d
xsn/ggml_scale_bias
xsn/gguf_cpp_wrapper
xsn/gguf-split-size
xsn/glm4v
xsn/gptoss_non_mxfp4_conversion
xsn/helium_test
xsn/hf_offline
xsn/hf_repo_hf_file_duplicate_name
xsn/hf_repo
xsn/homecook-mistral-o
xsn/httplib_cpp_h
xsn/httplib_0_19_0
xsn/hunyuan-moe
xsn/idefics3-fix-preproc
xsn/improve_common_log
xsn/improve_server_ui
xsn/improve_server_works
xsn/improve-gen-docs
xsn/intel-oneapi
xsn/internvl
xsn/janus_pro
xsn/kimi-vl
xsn/lazy_remote_tensor
xsn/lfm2_missing_tensor
xsn/lfm2_vl
xsn/lighton-ocr
xsn/llama_batch_remove_compat
xsn/llama_chat_tmpl_docs
xsn/llama_cpp_lib
xsn/llama_decode_enum
xsn/llama_lora_adapter_clear
xsn/llama_model_load_from_splits_cli
xsn/llama_model_load_from_splits
xsn/llama_set_attn_type_backup
xsn/llama_set_attn_type
xsn/llama4causal
xsn/llama4causalfix
xsn/llama4_mapping
xsn/llama4_rms_norm
xsn/llama4_scaling
xsn/llama4
xsn/llamax-demo
xsn/llava2
xsn/load_from_buffer
xsn/local_media_path
xsn/lora_convert_base_is_optional
xsn/lora_new_tokens_warn
xsn/lora_per_request
xsn/lora_server_hotswap
xsn/main_chat_template
xsn/main_chat_template_2
xsn/main_tmpl_preserve_nl
xsn/makefile_missing
xsn/master_test_decode_count
xsn/memleak_mtmd_helper
xsn/merge_llava_to_mtmd_cli
xsn/mergekit_extract_lora_compat
xsn/mimi_dec
xsn/minicpm_template2
xsn/minicpm-template
xsn/minicpmv_cli_fix
xsn/minicpmv-improve-sincos-embd
xsn/ministral3_quantized
xsn/ministral3
xsn/minor_fix_ui
xsn/missing-args
xsn/mistral_large_moe
xsn/mistral_large_scaling
xsn/mistral_small_vision
xsn/mistral_small
xsn/model_merge_with_embd
xsn/model_merge
xsn/more_try_catch_server
xsn/move_llava_to_mtmd
xsn/mrope_metal
xsn/mrope_normal_pos_text
xsn/mtmd_better_init_struct
xsn/mtmd_c_api
xsn/mtmd_cleanup_n_patches
xsn/mtmd_clip_private
xsn/mtmd_docs
xsn/mtmd_fix_batch_view_mrope
xsn/mtmd_fix_no_warmup
xsn/mtmd_fix_pub_header
xsn/mtmd_glmedge_rm_boi_eoi
xsn/mtmd_graph_builder_refactor
xsn/mtmd_helper_dedicated_file
xsn/mtmd_helper_dedicated_lib
xsn/mtmd_image_api
xsn/mtmd_improve_0
xsn/mtmd_llama4_new
xsn/mtmd_no_internal
xsn/mtmd_optimize_2d_rope
xsn/mtmd_pixtral
xsn/mtmd_qwen2vl_reduce_img_size
xsn/mtmd_qwen2vl
xsn/mtmd_refactor_audio_preproc
xsn/mtmd_remove_legacy
xsn/mtmd_rm_glm_eoi_boi
xsn/mtmd_set_log
xsn/mtmd_smolvlm
xsn/mtmd_ultravox
xsn/mtmd_warmup_bool
xsn/mtmd-cli-jinja
xsn/mtmd-initial-video-api
xsn/mtmd-max-min-pixels
xsn/need_insert_eot
xsn/nemotron-chat-template
xsn/nits_smollm3
xsn/no_curl_ggml_ci
xsn/no_n_predict_minus_2
xsn/no-warmup-arg
xsn/norway_problem
xsn/oai_add_system_fingerprint
xsn/oai_completions
xsn/oneoff_fix_mistral_tmpl
xsn/orion_chat_tmpl
xsn/paddleocr
xsn/phi3-convert
xsn/phi4_tmpl
xsn/phi-3-default-swa
xsn/phi-4-mm
xsn/pin_ci
xsn/pixtral_fix_backend
xsn/poc_cli_server_based
xsn/poc_interim_server
xsn/poc_proxy_router
xsn/poc_proxy_2
xsn/poc_proxy_3
xsn/private_batch_api
xsn/python_quantize_k
xsn/quantize_mtmd
xsn/qwen_allow_large_img_default
xsn/qwen_embd_pooling
xsn/qwen_vl_max_res
xsn/qwen2audio
xsn/qwen2vl_fix_text_pos
xsn/qwen3_embd_rerank
xsn/qwen25omni
xsn/readme_deps
xsn/redo_quant_threads
xsn/reduce_compile_time_arg
xsn/refactor_clip
xsn/refactor_cpu_dup_op
xsn/refactor_download
xsn/refactor_server_multitask_test
xsn/refactor_server_multitask
xsn/refactor_server_preset
xsn/refactor_server_slot_input
xsn/refactor_server_struct_input
xsn/refactor_server_struct_type
xsn/remove_train_fintune
xsn/renaming_server
xsn/reorganize_docs
xsn/rerank_tei_format
xsn/revert_rm_boi_eoi
xsn/revert_rm_timings
xsn/rework_get_started_docs
xsn/rm_extra_args_docs
xsn/rm_inp_one
xsn/rope_v2
xsn/router_cmd_stdout
xsn/router_no_content_length
xsn/server_anthropic_fix
xsn/server_audio
xsn/server_bench_docker
xsn/server_chat_cmpl_model
xsn/server_chat_template_detect
xsn/server_chat_template
xsn/server_clarify_kvu_np
xsn/server_clarify_slots
xsn/server_connection_is_alive
xsn/server_custom_tmpl
xsn/server_data_race
xsn/server_dev_docs
xsn/server_echo_logprobs_stream
xsn/server_embd_multitask
xsn/server_empty_prompt
xsn/server_explicit_access
xsn/server_explicit_exec_path
xsn/server_fix_stream_cancel
xsn/server_fix_2
xsn/server_functionary
xsn/server_improve_msg_diff
xsn/server_improve_spec
xsn/server_jinja_enabled_default
xsn/server_lightweight_chat_ui
xsn/server_missing_model_id
xsn/server_model_management_v1_2
xsn/server_models_autoload
xsn/server_more_args
xsn/server_more_tests
xsn/server_mtmd
xsn/server_no_cache_bug
xsn/server_no_err_out_of_ctx
xsn/server_node_22_11_0
xsn/server_params_2
xsn/server_preset_common_section
xsn/server_progress_zero
xsn/server_pytest
xsn/server_refactor_split_task_common
xsn/server_remove_gpt_3_name
xsn/server_res_error_ok_static
xsn/server_response_generator_refactor
xsn/server_router_overrides
xsn/server_separate_pos_tokens
xsn/server_shutdown_timeout
xsn/server_sleep
xsn/server_std_move
xsn/server_stop_timeout
xsn/server_sync_docs
xsn/server_task_create_state
xsn/server_thread_join_stop
xsn/server_tighten_cancel
xsn/server_tts_streamed
xsn/server_tts
xsn/server_twice_ctrl_c
xsn/server_ui_tok_per_sec
xsn/server-bring-back-stream-final-chunk
xsn/server-cleaup-oai-logic
xsn/server-fix-infill-format
xsn/server-lib-version-bump
xsn/server-mistral-template
xsn/slot_state_machine_segv
xsn/slot_state_machine
xsn/smollm3_fix_jinja_tmpl
xsn/speed_up_compilation
xsn/split_http_server_context
xsn/split_without_tensor
xsn/tag_based_hf_repo
xsn/temp_fix_httplib
xsn/test_docker_arm
xsn/test_pixtral_fixed_size
xsn/this_tts_test
xsn/tool_call
xsn/typo_gml_glm
xsn/ui_copy_btn
xsn/ultravox
xsn/update_main_docs
xsn/use_repeat_4d
xsn/vision
xsn/vision_2
xsn/voxtral
xsn/wasm_simd
xsn/webui_conv_branching
xsn/webui_fix_numeric_settings
xsn/webui_m_q_params
xsn/webui_max_file_size
xsn/webui_modalities
xsn/webui_pako
xsn/webui_pyodide
xsn/webui_reactjs
xsn/webui_rework_input
xsn/webui_small_misalignment
xsn/win_curl_static
xsn/wllama
xsn/xiaomi_mimo_v2
xsn/xiaomi_mimo
Apply suggestions from code review
ngxson
committed
149 days ago
Verified
8c543f7c
correct project activation fn
ngxson
committed
149 days ago
4556b403
minor coding style improvement
ngxson
committed
149 days ago
01bf6872
fix regression for ultravox
ngxson
committed
149 days ago
8b2d72da
add docs and tests
ngxson
committed
149 days ago
738be198
also support Devstral conversion
ngxson
committed
149 days ago
b828887a
add [BEGIN_AUDIO] token
ngxson
committed
149 days ago
97119dd7
Merge branch 'master' into xsn/voxtral
ngxson
committed
149 days ago
49045bd3
quantize : update README.md (#14905)
EAddario
committed
149 days ago
Verified
7f975995
vulkan: add ops docs (#14900)
0cc4m
committed
150 days ago
Verified
bf78f543
SYCL: add ops doc (#14901)
qnixsynapse
committed
150 days ago
Verified
bbfc8492
llama : clarify comment about pp and tg graphs [no ci] (#14895)
danbev
committed
150 days ago
Verified
ca0ef2dd
vulkan : add fp16 support for the conv_2d kernel (#14872)
Green-Sky
committed
150 days ago
Verified
89d10295
vulkan: skip empty set_rows to avoid invalid API usage (#14860)
jeffbolznv
committed
150 days ago
Verified
f1a4e72d
model : make rope_yarn_log_mul optional for deepseek2 (#14896)
gabriellarson
committed
150 days ago
Verified
4762ad73
llama : fix kq_scale for the attention layers of PLaMo2 (#14892)
mitmul
committed
150 days ago
Verified
1dc9614e
Docs: add instructions for adding backends (#14889)
am17an
committed
150 days ago
Verified
446595b9
HIP: Enable Matrix cores for MMQ Kernels, Enable stream-K for CDNA 3 (#14624)
deepsek
committed
150 days ago
Verified
66906cd8
CANN: Implement GLU ops (#14884)
hipudding
committed
151 days ago
Verified
11dd5a44
musa: fix build warnings (unused variable) (#14869)
yeahdongcn
committed
151 days ago
Verified
9b8f3c6c
ggml-cpu : disable GGML_NNPA by default due to instability (#14880)
taronaeo
committed
152 days ago
Verified
c7f3169c
metal: SSM_SCAN performance (#14743)
gabe-l-hart
committed
152 days ago
Verified
793c0d7f
opencl: add fused `rms_norm_mul` (#14841)
lhez
committed
152 days ago
Verified
ce111d39
docs : update HOWTO‑add‑model.md for ModelBase and new model classes (#14874)
wooksong
committed
152 days ago
Verified
e7fecba9
ggml : remove invalid portPos specifiers from dot files (#14838)
ORippler
committed
152 days ago
Verified
e2b7621e
context : restore preemptive sched reset when LLAMA_SET_ROWS=0 (#14870)
ggerganov
committed
152 days ago
Verified
c1dbea75
mtmd : fix 32-bit narrowing issue in export-lora and mtmd clip (#14503)
kiwi142857
committed
152 days ago
Verified
749e0d27
rpc : check for null buffers in get/set/copy tensor endpoints (#14868)
struct
committed
152 days ago
Verified
64bf1c37
sched : fix multiple evaluations of the same graph with pipeline parallelism (#14855)
slaren
committed
152 days ago
Verified
c12bbde3
fix python requirements
ngxson
committed
152 days ago
2da31edd
Older