Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
ngxson/llama.cpp
Pull Requests
Commits
xsn/server-mistral-template
debug_server_pref
gemma3n_mtmd
hp/split/load-model
master
poc/vision
tmp0
wsn/server_health_non_blocking
xsn/a11y
xsn/accept_pdf
xsn/add_n_support
xsn/agents_md
xsn/arch_refactor_llm_names
xsn/arg_add_catalog
xsn/arg_better_handle_hf_mmproj
xsn/arg_cpp
xsn/arg_ctk_ctv
xsn/arg_missing_ifdef
xsn/arg_mm
xsn/arg_neg_fix
xsn/arg_neg
xsn/arg_unused_var
xsn/argparser_v3
xsn/asan_arg_smpl
xsn/better_error
xsn/better_server_json_value
xsn/bug_report_add_cmd
xsn/bump_transformers
xsn/cache_missing_slash
xsn/cache_model_list
xsn/cancellable_request
xsn/chafik_webui_mcp_idea
xsn/chat_apply_template
xsn/chat_cli
xsn/chat_int_overflow
xsn/chat_template_prefix_postfix
xsn/chat_tmpl_alias
xsn/chat_tmpl_enumerate
xsn/check_vendor_ci
xsn/ci_cpu_ubuntu_20
xsn/ci_docker_no_fast_fail
xsn/ci_fix_arm64
xsn/ci_fix_arm64_2
xsn/ci_ggml_org_hosted
xsn/ci-permission
xsn/clean_up_server
xsn/cleanup_oai
xsn/cli_arrow_left_right
xsn/cli_auto_cnv
xsn/cli_buffered_logs
xsn/cli_command
xsn/cli_jinja_default
xsn/cli_move_warning
xsn/cli_server_based
xsn/clip_ffn_up_down_fix
xsn/clip_fix_model_size_display
xsn/clip_gpu
xsn/clip_improve_concat
xsn/clip_no_mmproj_offload
xsn/clip_no_print_ftype
xsn/clip_preprocessing_refactor
xsn/clip_proj_naming
xsn/clip_refactor_img_manip
xsn/clip_refactor_set_input
xsn/clip_refactor_smaller_files
xsn/clip_smart_ptr
xsn/codeowners
xsn/codeowners2
xsn/common_cpp_no_json
xsn/common_remote_get_content
xsn/compare_logits
xsn/control-vector-generator
xsn/control-vector-multiprompt
xsn/convert_fix_llama4_clash
xsn/convert_gguf_qwen2vl
xsn/convert_improve_arch_handling
xsn/convert_kimi_k2_quant_repack
xsn/convert_kimi_k2_quant
xsn/convert_mmproj_type_mean_std
xsn/convert_mmproj
xsn/convert_update_qol
xsn/correct_llama2_template
xsn/cors_proxy_demo
xsn/create_server_context
xsn/csm_tts_batched_decode
xsn/csm_tts
xsn/csv_arg
xsn/curl_ci_test
xsn/curl_on_by_default
xsn/curl_static
xsn/custom_swa_list
xsn/cvector_fix_pca
xsn/cvector-better-prompt
xsn/cvector-fix
xsn/deepseek_r1_qwen
xsn/deepseek-ocr
xsn/defer-server-task
xsn/devstral2_convert
xsn/disallow_remote_code_convert
xsn/docker_no_build_test
xsn/docs-sycl-vulkan
xsn/dotsllm1
xsn/dotsocr
xsn/download_cpp
xsn/duplicated_tensor_name
xsn/embedding_input
xsn/emscripten_webgpu
xsn/env_var_speculative
xsn/exaone_tied_embd
xsn/exceed_context_size_error
xsn/fix_audio_patch_size_zero
xsn/fix_chat_tmpl
xsn/fix_ci_test
xsn/fix_ci
xsn/fix_console_backspace
xsn/fix_curl_old_ver
xsn/fix_docker_ci
xsn/fix_empty_batch
xsn/fix_emscripten_build
xsn/fix_export_lora_2
xsn/fix_gemma2_tokenizer
xsn/fix_gemma3n_conversion
xsn/fix_get_weights
xsn/fix_imatrix_arg
xsn/fix_kimi_k2_tmpl
xsn/fix_kv_shift_qwen2vl
xsn/fix_llam4_conversion
xsn/fix_llama_api_missing
xsn/fix_llama_lora
xsn/fix_logprobs
xsn/fix_lora_convert
xsn/fix_lora_merge
xsn/fix_lora_merge_2
xsn/fix_lora
xsn/fix_main_cnv_tmpl
xsn/fix_metal_im2col
xsn/fix_mistral_chat_format
xsn/fix_order_unary_ops
xsn/fix_qwen_omni_conv
xsn/fix_qwen3_nb
xsn/fix_res_error
xsn/fix_router_ssl
xsn/fix_server_chat_template
xsn/fix_server_test_exit
xsn/fix_slow_ci
xsn/fix_swa_freq
xsn/fix_sys_prompt
xsn/fix_test_timeout
xsn/fix_uhd_preprocessing
xsn/fix_ui_copy_function
xsn/fix_unsupported_chat_tmpl
xsn/fix_url_mismatch
xsn/fix-async-iterator-safari
xsn/fix-fattn-qwen25vl
xsn/fix-mrope
xsn/fix-mrope-asan-error
xsn/fix-mrope-causal
xsn/fix-server-task-lock
xsn/flash_attn_lora
xsn/full_image_less
xsn/gelu_erf_cu
xsn/gelu_na
xsn/gemma_template
xsn/gemma2_mask_swa
xsn/gemma3_lm_head
xsn/gemma3n_audio
xsn/gemma3n
xsn/gemma-multiple-system-role
xsn/ggml_cast_f32_i32
xsn/ggml_fill
xsn/ggml_repeat_4d
xsn/ggml_scale_bias
xsn/gguf_cpp_wrapper
xsn/gguf-split-size
xsn/glm4v
xsn/gptoss_non_mxfp4_conversion
xsn/helium_test
xsn/hf_offline
xsn/hf_repo_hf_file_duplicate_name
xsn/hf_repo
xsn/homecook-mistral-o
xsn/httplib_cpp_h
xsn/httplib_0_19_0
xsn/hunyuan-moe
xsn/idefics3-fix-preproc
xsn/improve_common_log
xsn/improve_server_ui
xsn/improve_server_works
xsn/improve-gen-docs
xsn/intel-oneapi
xsn/internvl
xsn/janus_pro
xsn/jinja_vm
xsn/kimi-vl
xsn/lazy_remote_tensor
xsn/lfm2_missing_tensor
xsn/lfm2_vl
xsn/lighton-ocr
xsn/llama_batch_remove_compat
xsn/llama_chat_tmpl_docs
xsn/llama_cpp_lib
xsn/llama_decode_enum
xsn/llama_lora_adapter_clear
xsn/llama_model_load_from_splits_cli
xsn/llama_model_load_from_splits
xsn/llama_set_attn_type_backup
xsn/llama_set_attn_type
xsn/llama4causal
xsn/llama4causalfix
xsn/llama4_mapping
xsn/llama4_rms_norm
xsn/llama4_scaling
xsn/llama4
xsn/llamax-demo
xsn/llava2
xsn/load_from_buffer
xsn/local_media_path
xsn/lora_convert_base_is_optional
xsn/lora_new_tokens_warn
xsn/lora_per_request
xsn/lora_server_hotswap
xsn/main_chat_template
xsn/main_chat_template_2
xsn/main_tmpl_preserve_nl
xsn/makefile_missing
xsn/master_test_decode_count
xsn/mem_hybrid_iswa
xsn/memleak_mtmd_helper
xsn/merge_llava_to_mtmd_cli
xsn/mergekit_extract_lora_compat
xsn/mimi_dec
xsn/minicpm_template2
xsn/minicpm-template
xsn/minicpmv_cli_fix
xsn/minicpmv-improve-sincos-embd
xsn/ministral3_quantized
xsn/ministral3
xsn/minor_fix_ui
xsn/missing-args
xsn/mistral_large_moe
xsn/mistral_large_scaling
xsn/mistral_small_vision
xsn/mistral_small
xsn/model_merge_with_embd
xsn/model_merge
xsn/more_try_catch_server
xsn/move_llava_to_mtmd
xsn/mrope_metal
xsn/mrope_normal_pos_text
xsn/mtmd_ai_gen_pr
xsn/mtmd_better_init_struct
xsn/mtmd_c_api
xsn/mtmd_cleanup_n_patches
xsn/mtmd_clip_private
xsn/mtmd_docs
xsn/mtmd_fix_batch_view_mrope
xsn/mtmd_fix_no_warmup
xsn/mtmd_fix_pub_header
xsn/mtmd_glmedge_rm_boi_eoi
xsn/mtmd_graph_builder_refactor
xsn/mtmd_helper_dedicated_file
xsn/mtmd_helper_dedicated_lib
xsn/mtmd_image_api
xsn/mtmd_improve_0
xsn/mtmd_llama4_new
xsn/mtmd_no_internal
xsn/mtmd_optimize_2d_rope
xsn/mtmd_pixtral
xsn/mtmd_qwen2vl_reduce_img_size
xsn/mtmd_qwen2vl
xsn/mtmd_refactor_audio_preproc
xsn/mtmd_remove_legacy
xsn/mtmd_rm_glm_eoi_boi
xsn/mtmd_set_log
xsn/mtmd_smolvlm
xsn/mtmd_ultravox
xsn/mtmd_warmup_bool
xsn/mtmd-cli-jinja
xsn/mtmd-initial-video-api
xsn/mtmd-max-min-pixels
xsn/need_insert_eot
xsn/nemotron-chat-template
xsn/nits_smollm3
xsn/no_curl_ggml_ci
xsn/no_n_predict_minus_2
xsn/no-warmup-arg
xsn/norway_problem
xsn/oai_add_system_fingerprint
xsn/oai_completions
xsn/oneoff_fix_mistral_tmpl
xsn/orion_chat_tmpl
xsn/paddleocr
xsn/phi3-convert
xsn/phi4_tmpl
xsn/phi-3-default-swa
xsn/phi-4-mm
xsn/pin_ci
xsn/pixtral_fix_backend
xsn/poc_cli_server_based
xsn/poc_interim_server
xsn/poc_proxy_router
xsn/poc_proxy_2
xsn/poc_proxy_3
xsn/poc_server_audio_gen
xsn/private_batch_api
xsn/python_quantize_k
xsn/quantize_mtmd
xsn/qwen_allow_large_img_default
xsn/qwen_embd_pooling
xsn/qwen_vl_max_res
xsn/qwen2audio
xsn/qwen2vl_fix_text_pos
xsn/qwen3a
xsn/qwen3_embd_rerank
xsn/qwen25omni
xsn/readme_deps
xsn/redo_quant_threads
xsn/reduce_compile_time_arg
xsn/refactor_clip
xsn/refactor_cpu_dup_op
xsn/refactor_download
xsn/refactor_server_multitask_test
xsn/refactor_server_multitask
xsn/refactor_server_preset
xsn/refactor_server_slot_input
xsn/refactor_server_struct_input
xsn/refactor_server_struct_type
xsn/remote_preset
xsn/remove_train_fintune
xsn/renaming_server
xsn/reorganize_docs
xsn/rerank_tei_format
xsn/revert_rm_boi_eoi
xsn/revert_rm_timings
xsn/rework_get_started_docs
xsn/rm_extra_args_docs
xsn/rm_inp_one
xsn/rope_v2
xsn/router_cmd_stdout
xsn/router_no_content_length
xsn/server_anthropic_fix
xsn/server_audio
xsn/server_bench_docker
xsn/server_chat_cmpl_model
xsn/server_chat_template_detect
xsn/server_chat_template
xsn/server_clarify_kvu_np
xsn/server_clarify_slots
xsn/server_connection_is_alive
xsn/server_custom_tmpl
xsn/server_data_race
xsn/server_dev_docs
xsn/server_echo_logprobs_stream
xsn/server_embd_multitask
xsn/server_empty_prompt
xsn/server_explicit_access
xsn/server_explicit_exec_path
xsn/server_fix_stream_cancel
xsn/server_fix_2
xsn/server_functionary
xsn/server_improve_msg_diff
xsn/server_improve_spec
xsn/server_jinja_enabled_default
xsn/server_lightweight_chat_ui
xsn/server_missing_model_id
xsn/server_model_management_v1_2
xsn/server_models_autoload
xsn/server_more_args
xsn/server_more_tests
xsn/server_mtmd
xsn/server_no_cache_bug
xsn/server_no_err_out_of_ctx
xsn/server_node_22_11_0
xsn/server_params_2
xsn/server_preset_common_section
xsn/server_progress_zero
xsn/server_pytest
xsn/server_refactor_split_task_common
xsn/server_remove_gpt_3_name
xsn/server_res_error_ok_static
xsn/server_response_generator_refactor
xsn/server_router_overrides
xsn/server_separate_pos_tokens
xsn/server_shutdown_timeout
xsn/server_sleep
xsn/server_std_move
xsn/server_stop_timeout
xsn/server_sync_docs
xsn/server_task_create_state
xsn/server_thread_join_stop
xsn/server_tighten_cancel
xsn/server_tts_streamed
xsn/server_tts
xsn/server_twice_ctrl_c
xsn/server_ui_tok_per_sec
xsn/server-bring-back-stream-final-chunk
xsn/server-cleaup-oai-logic
xsn/server-fix-infill-format
xsn/server-lib-version-bump
xsn/server-mistral-template
xsn/slot_state_machine_segv
xsn/slot_state_machine
xsn/smollm3_fix_jinja_tmpl
xsn/speed_up_compilation
xsn/split_http_server_context
xsn/split_without_tensor
xsn/tag_based_hf_repo
xsn/temp_fix_httplib
xsn/test_docker_arm
xsn/test_pixtral_fixed_size
xsn/this_tts_test
xsn/tool_call
xsn/typo_gml_glm
xsn/ui_copy_btn
xsn/ultravox
xsn/update_main_docs
xsn/use_httplib_boringssl_default
xsn/use_repeat_4d
xsn/vision
xsn/vision_2
xsn/voxtral
xsn/wasm_simd
xsn/webui_conv_branching
xsn/webui_fix_numeric_settings
xsn/webui_m_q_params
xsn/webui_max_file_size
xsn/webui_modalities
xsn/webui_pako
xsn/webui_pyodide
xsn/webui_reactjs
xsn/webui_rework_input
xsn/webui_small_misalignment
xsn/win_curl_static
xsn/wllama
xsn/xiaomi_mimo_v2
xsn/xiaomi_mimo
server: clean up using_chatml variable
ngxson
committed
1 year ago
Verified
1a274064
server: validate "--chat-template" argument
ngxson
committed
1 year ago
ebe30795
server: format_llama2: remove BOS
ngxson
committed
1 year ago
7efef47d
server: rename template mistral to llama2
ngxson
committed
1 year ago
269437e4
server: fix typo
ngxson
committed
1 year ago
27976c31
server: add mistral chat template
ngxson
committed
1 year ago
2ebedda3
llama : do not print "offloading layers" message in CPU-only builds (#5416)
slaren
committed
1 year ago
Verified
41f308f5
Fix f16_sycl cpy call from Arc (#5411)
abhilash1910
committed
1 year ago
Verified
6e99f2a0
llava : add missing .py, and fix paths in README.md (#5414)
danbev
committed
1 year ago
Verified
ff4ff05c
fix trailing whitespace (#5407)
JohannesGaessler
committed
1 year ago
Verified
b7b74cef
llama : fix MiniCPM (#5392)
runfuture
committed
1 year ago
Verified
4aa43fab
llava: fix typo/formatting in README.md (#5405)
danbev
committed
1 year ago
Verified
a6e514a8
sampling: fix top_k <= 0 (#5388)
JohannesGaessler
committed
1 year ago
Verified
26d4efd1
tests : .gitignore obj files
ggerganov
committed
1 year ago
Verified
8504d2d0
CMAKE_OSX_ARCHITECTURES for MacOS cross compilation (#5393)
Xarbirus
committed
1 year ago
Verified
c4fbb671
fix typo in readme (#5399)
ebeyabraham
committed
1 year ago
Verified
8c933b70
Add Ava in the list of llama.cpp UIs (#4362)
cztomsik
committed
1 year ago
Verified
b906596b
CUDA: fixed mmvq kernel for bs 2,3,4 and -sm row (#5386)
JohannesGaessler
committed
1 year ago
Verified
aa7ab99b
[SYCL] update install make by w64devkit (#5297)
NeoZhangJianyu
committed
1 year ago
Verified
10afa6f1
llava-cli : always tokenize special tokens (#5382)
jxy
committed
1 year ago
Verified
0ef46da6
Basic Vulkan Multi-GPU implementation (#5321)
0cc4m
committed
1 year ago
Verified
ee1628bd
readme : modernize (#5379)
netrunnereve
committed
1 year ago
Verified
ed0bf322
readme : update ui list (#5354)
biw
committed
1 year ago
Verified
9a697d84
llama : add MiniCPM support (#5346)
runfuture
committed
1 year ago
Verified
316c7faf
server : update `/props` with "total_slots" value (#5373)
jparkerweb
committed
1 year ago
Verified
f3e2b4fa
convert : fix TypeError on GPT-2 vocab.json (#5288)
Sang-Kil Park
committed
1 year ago
Verified
f68664ac
server : remove model.json endpoint (#5371)
z80maniac
committed
1 year ago
Verified
213d1439
CUDA: mul_mat_vec_q max. batch size 8 -> 4 (#5370)
JohannesGaessler
committed
1 year ago
Verified
17c97fb0
Update README.md (#5366)
ikawrakow
committed
1 year ago
Verified
b08f22c8
Slight quantization improvement for Q4_K and Q5_K (#5361)
ikawrakow
committed
1 year ago
Verified
f57fadc0
Older