Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
ngxson/llama.cpp
Pull Requests
Commits
xsn/fix_ci
debug_server_pref
gemma3n_mtmd
hp/split/load-model
master
poc/vision
tmp0
wsn/server_health_non_blocking
xsn/a11y
xsn/accept_pdf
xsn/add_n_support
xsn/agents_md
xsn/arch_refactor_llm_names
xsn/arg_add_catalog
xsn/arg_better_handle_hf_mmproj
xsn/arg_cpp
xsn/arg_ctk_ctv
xsn/arg_missing_ifdef
xsn/arg_mm
xsn/arg_neg_fix
xsn/arg_neg
xsn/arg_unused_var
xsn/argparser_v3
xsn/asan_arg_smpl
xsn/better_error
xsn/better_server_json_value
xsn/bug_report_add_cmd
xsn/bump_transformers
xsn/cache_missing_slash
xsn/cache_model_list
xsn/cancellable_request
xsn/chafik_webui_mcp_idea
xsn/chat_apply_template
xsn/chat_cli
xsn/chat_int_overflow
xsn/chat_template_prefix_postfix
xsn/chat_tmpl_alias
xsn/chat_tmpl_enumerate
xsn/check_vendor_ci
xsn/ci_cpu_ubuntu_20
xsn/ci_docker_no_fast_fail
xsn/ci_fix_arm64
xsn/ci_fix_arm64_2
xsn/ci_ggml_org_hosted
xsn/ci-permission
xsn/clean_up_server
xsn/cleanup_oai
xsn/cli_arrow_left_right
xsn/cli_auto_cnv
xsn/cli_buffered_logs
xsn/cli_command
xsn/cli_jinja_default
xsn/cli_move_warning
xsn/cli_server_based
xsn/clip_ffn_up_down_fix
xsn/clip_fix_model_size_display
xsn/clip_gpu
xsn/clip_improve_concat
xsn/clip_no_mmproj_offload
xsn/clip_no_print_ftype
xsn/clip_preprocessing_refactor
xsn/clip_proj_naming
xsn/clip_refactor_img_manip
xsn/clip_refactor_set_input
xsn/clip_refactor_smaller_files
xsn/clip_smart_ptr
xsn/codeowners
xsn/codeowners2
xsn/common_cpp_no_json
xsn/common_remote_get_content
xsn/compare_logits
xsn/control-vector-generator
xsn/control-vector-multiprompt
xsn/convert_fix_llama4_clash
xsn/convert_gguf_qwen2vl
xsn/convert_improve_arch_handling
xsn/convert_kimi_k2_quant_repack
xsn/convert_kimi_k2_quant
xsn/convert_mmproj_type_mean_std
xsn/convert_mmproj
xsn/convert_update_qol
xsn/correct_llama2_template
xsn/cors_proxy_demo
xsn/create_server_context
xsn/csm_tts_batched_decode
xsn/csm_tts
xsn/csv_arg
xsn/curl_ci_test
xsn/curl_on_by_default
xsn/curl_static
xsn/custom_swa_list
xsn/cvector_fix_pca
xsn/cvector-better-prompt
xsn/cvector-fix
xsn/deepseek_r1_qwen
xsn/deepseek-ocr
xsn/defer-server-task
xsn/devstral2_convert
xsn/disallow_remote_code_convert
xsn/docker_no_build_test
xsn/docs-sycl-vulkan
xsn/dotsllm1
xsn/dotsocr
xsn/download_cpp
xsn/duplicated_tensor_name
xsn/embedding_input
xsn/emscripten_webgpu
xsn/env_var_speculative
xsn/exaone_tied_embd
xsn/exceed_context_size_error
xsn/fix_audio_patch_size_zero
xsn/fix_chat_tmpl
xsn/fix_ci_test
xsn/fix_ci
xsn/fix_console_backspace
xsn/fix_curl_old_ver
xsn/fix_docker_ci
xsn/fix_empty_batch
xsn/fix_emscripten_build
xsn/fix_export_lora_2
xsn/fix_gemma2_tokenizer
xsn/fix_gemma3n_conversion
xsn/fix_get_weights
xsn/fix_imatrix_arg
xsn/fix_kimi_k2_tmpl
xsn/fix_kv_shift_qwen2vl
xsn/fix_llam4_conversion
xsn/fix_llama_api_missing
xsn/fix_llama_lora
xsn/fix_logprobs
xsn/fix_lora_convert
xsn/fix_lora_merge
xsn/fix_lora_merge_2
xsn/fix_lora
xsn/fix_main_cnv_tmpl
xsn/fix_metal_im2col
xsn/fix_mistral_chat_format
xsn/fix_n_cmpl
xsn/fix_order_unary_ops
xsn/fix_qwen_omni_conv
xsn/fix_qwen3_nb
xsn/fix_res_error
xsn/fix_router_ssl
xsn/fix_server_chat_template
xsn/fix_server_test_exit
xsn/fix_slow_ci
xsn/fix_swa_freq
xsn/fix_sys_prompt
xsn/fix_test_timeout
xsn/fix_uhd_preprocessing
xsn/fix_ui_copy_function
xsn/fix_unsupported_chat_tmpl
xsn/fix_url_mismatch
xsn/fix-async-iterator-safari
xsn/fix-fattn-qwen25vl
xsn/fix-mrope
xsn/fix-mrope-asan-error
xsn/fix-mrope-causal
xsn/fix-server-task-lock
xsn/flash_attn_lora
xsn/full_image_less
xsn/gelu_erf_cu
xsn/gelu_na
xsn/gemma_template
xsn/gemma2_mask_swa
xsn/gemma3_lm_head
xsn/gemma3n_audio
xsn/gemma3n
xsn/gemma-multiple-system-role
xsn/ggml_cast_f32_i32
xsn/ggml_fill
xsn/ggml_repeat_4d
xsn/ggml_scale_bias
xsn/gguf_cpp_wrapper
xsn/gguf-split-size
xsn/glm4v
xsn/gptoss_non_mxfp4_conversion
xsn/helium_test
xsn/hf_offline
xsn/hf_repo_hf_file_duplicate_name
xsn/hf_repo
xsn/homecook-mistral-o
xsn/httplib_cpp_h
xsn/httplib_0_19_0
xsn/hunyuan-moe
xsn/idefics3-fix-preproc
xsn/improve_common_log
xsn/improve_server_ui
xsn/improve_server_works
xsn/improve-gen-docs
xsn/intel-oneapi
xsn/internvl
xsn/janus_pro
xsn/jinja_vm
xsn/kimi-vl
xsn/lazy_remote_tensor
xsn/lfm2_missing_tensor
xsn/lfm2_vl
xsn/lighton-ocr
xsn/llama_batch_remove_compat
xsn/llama_chat_tmpl_docs
xsn/llama_cpp_lib
xsn/llama_decode_enum
xsn/llama_lora_adapter_clear
xsn/llama_model_load_from_splits_cli
xsn/llama_model_load_from_splits
xsn/llama_set_attn_type_backup
xsn/llama_set_attn_type
xsn/llama4causal
xsn/llama4causalfix
xsn/llama4_mapping
xsn/llama4_rms_norm
xsn/llama4_scaling
xsn/llama4
xsn/llamax-demo
xsn/llava2
xsn/load_from_buffer
xsn/local_media_path
xsn/lora_convert_base_is_optional
xsn/lora_new_tokens_warn
xsn/lora_per_request
xsn/lora_server_hotswap
xsn/main_chat_template
xsn/main_chat_template_2
xsn/main_tmpl_preserve_nl
xsn/makefile_missing
xsn/master_test_decode_count
xsn/mem_hybrid_iswa
xsn/memleak_mtmd_helper
xsn/merge_llava_to_mtmd_cli
xsn/mergekit_extract_lora_compat
xsn/mimi_dec
xsn/minicpm_template2
xsn/minicpm-template
xsn/minicpmv_cli_fix
xsn/minicpmv-improve-sincos-embd
xsn/ministral3_quantized
xsn/ministral3
xsn/minor_fix_ui
xsn/missing-args
xsn/mistral_large_moe
xsn/mistral_large_scaling
xsn/mistral_small_vision
xsn/mistral_small
xsn/model_merge_with_embd
xsn/model_merge
xsn/more_try_catch_server
xsn/move_llava_to_mtmd
xsn/mrope_metal
xsn/mrope_normal_pos_text
xsn/mtmd_ai_gen_pr
xsn/mtmd_better_init_struct
xsn/mtmd_c_api
xsn/mtmd_cleanup_n_patches
xsn/mtmd_clip_private
xsn/mtmd_docs
xsn/mtmd_fix_batch_view_mrope
xsn/mtmd_fix_no_warmup
xsn/mtmd_fix_pub_header
xsn/mtmd_glmedge_rm_boi_eoi
xsn/mtmd_graph_builder_refactor
xsn/mtmd_helper_dedicated_file
xsn/mtmd_helper_dedicated_lib
xsn/mtmd_image_api
xsn/mtmd_improve_0
xsn/mtmd_llama4_new
xsn/mtmd_no_internal
xsn/mtmd_optimize_2d_rope
xsn/mtmd_pixtral
xsn/mtmd_qwen2vl_reduce_img_size
xsn/mtmd_qwen2vl
xsn/mtmd_refactor_audio_preproc
xsn/mtmd_remove_legacy
xsn/mtmd_rm_glm_eoi_boi
xsn/mtmd_set_log
xsn/mtmd_smolvlm
xsn/mtmd_ultravox
xsn/mtmd_warmup_bool
xsn/mtmd-cli-jinja
xsn/mtmd-initial-video-api
xsn/mtmd-max-min-pixels
xsn/need_insert_eot
xsn/nemotron-chat-template
xsn/nits_smollm3
xsn/no_curl_ggml_ci
xsn/no_n_predict_minus_2
xsn/no-warmup-arg
xsn/norway_problem
xsn/oai_add_system_fingerprint
xsn/oai_completions
xsn/oneoff_fix_mistral_tmpl
xsn/orion_chat_tmpl
xsn/paddleocr
xsn/phi3-convert
xsn/phi4_tmpl
xsn/phi-3-default-swa
xsn/phi-4-mm
xsn/pin_ci
xsn/pixtral_fix_backend
xsn/poc_cli_server_based
xsn/poc_interim_server
xsn/poc_proxy_router
xsn/poc_proxy_2
xsn/poc_proxy_3
xsn/poc_server_audio_gen
xsn/private_batch_api
xsn/python_quantize_k
xsn/quantize_mtmd
xsn/qwen_allow_large_img_default
xsn/qwen_embd_pooling
xsn/qwen_vl_max_res
xsn/qwen2audio
xsn/qwen2vl_fix_text_pos
xsn/qwen3a
xsn/qwen3_embd_rerank
xsn/qwen3next_improve
xsn/qwen3_vl_embd
xsn/qwen25omni
xsn/readme_deps
xsn/redo_quant_threads
xsn/reduce_compile_time_arg
xsn/refactor_clip
xsn/refactor_cpu_dup_op
xsn/refactor_download
xsn/refactor_server_multitask_test
xsn/refactor_server_multitask
xsn/refactor_server_preset
xsn/refactor_server_slot_input
xsn/refactor_server_struct_input
xsn/refactor_server_struct_type
xsn/remote_preset
xsn/remove_train_fintune
xsn/renaming_server
xsn/reorganize_docs
xsn/rerank_tei_format
xsn/revert_rm_boi_eoi
xsn/revert_rm_timings
xsn/rework_get_started_docs
xsn/rm_extra_args_docs
xsn/rm_inp_one
xsn/rope_v2
xsn/router_cmd_stdout
xsn/router_no_content_length
xsn/server_anthropic_fix
xsn/server_audio
xsn/server_bench_docker
xsn/server_chat_cmpl_model
xsn/server_chat_template_detect
xsn/server_chat_template
xsn/server_clarify_kvu_np
xsn/server_clarify_slots
xsn/server_connection_is_alive
xsn/server_custom_tmpl
xsn/server_data_race
xsn/server_dev_docs
xsn/server_echo_logprobs_stream
xsn/server_embd_multitask
xsn/server_empty_prompt
xsn/server_explicit_access
xsn/server_explicit_exec_path
xsn/server_fix_stream_cancel
xsn/server_fix_2
xsn/server_functionary
xsn/server_improve_msg_diff
xsn/server_improve_spec
xsn/server_jinja_enabled_default
xsn/server_lightweight_chat_ui
xsn/server_missing_model_id
xsn/server_model_management_v1_2
xsn/server_models_autoload
xsn/server_more_args
xsn/server_more_tests
xsn/server_mtmd
xsn/server_no_cache_bug
xsn/server_no_err_out_of_ctx
xsn/server_node_22_11_0
xsn/server_params_2
xsn/server_preset_common_section
xsn/server_progress_zero
xsn/server_pytest
xsn/server_refactor_split_task_common
xsn/server_remove_gpt_3_name
xsn/server_res_error_ok_static
xsn/server_response_generator_refactor
xsn/server_router_overrides
xsn/server_separate_pos_tokens
xsn/server_shutdown_timeout
xsn/server_sleep
xsn/server_std_move
xsn/server_stop_timeout
xsn/server_sync_docs
xsn/server_task_create_state
xsn/server_thread_join_stop
xsn/server_tighten_cancel
xsn/server_tts_streamed
xsn/server_tts
xsn/server_twice_ctrl_c
xsn/server_ui_tok_per_sec
xsn/server-bring-back-stream-final-chunk
xsn/server-cleaup-oai-logic
xsn/server-fix-infill-format
xsn/server-lib-version-bump
xsn/server-mistral-template
xsn/slot_state_machine_segv
xsn/slot_state_machine
xsn/smollm3_fix_jinja_tmpl
xsn/speed_up_compilation
xsn/split_http_server_context
xsn/split_without_tensor
xsn/tag_based_hf_repo
xsn/temp_fix_httplib
xsn/test_docker_arm
xsn/test_pixtral_fixed_size
xsn/this_tts_test
xsn/tool_call
xsn/typo_gml_glm
xsn/ui_copy_btn
xsn/ultravox
xsn/update_main_docs
xsn/use_httplib_boringssl_default
xsn/use_repeat_4d
xsn/vision
xsn/vision_2
xsn/voxtral
xsn/wasm_simd
xsn/webui_conv_branching
xsn/webui_fix_numeric_settings
xsn/webui_m_q_params
xsn/webui_max_file_size
xsn/webui_modalities
xsn/webui_pako
xsn/webui_pyodide
xsn/webui_reactjs
xsn/webui_rework_input
xsn/webui_small_misalignment
xsn/win_curl_static
xsn/wllama
xsn/xiaomi_mimo_v2
xsn/xiaomi_mimo
Initialize default slot sampling parameters from the global context. (#8418)
HanClinto
committed
1 year ago
Verified
278d0e18
Name Migration: Build the deprecation-warning 'main' binary every time (#8404)
HanClinto
committed
1 year ago
Verified
dd07a123
[SYCL] Use multi_ptr to clean up deprecated warnings (#8256)
AidanBeltonS
committed
1 year ago
Verified
f4444d99
ggml : move sgemm sources to llamafile subfolder (#8394)
ggerganov
committed
1 year ago
Verified
6b2a849d
ggml : add AArch64 optimized GEMV and GEMM Q4 kernels (#5780)
Dibakar
committed
1 year ago
Verified
0f1a39f3
gguf-py rel pipeline (#8410)
monatis
committed
1 year ago
Verified
83321c69
llama : C++20 compatibility for u8 strings (#8408)
iboB
committed
1 year ago
Verified
cc61948b
msvc : silence codecvt c++17 deprecation warnings (#8395)
iboB
committed
1 year ago
Verified
7a80710d
llama : add assert about missing llama_encode() call (#8400)
fairydreaming
committed
1 year ago
Verified
a8be1e6f
py : fix converter for internlm2 (#8321)
RunningLeon
committed
1 year ago
Verified
e4dd31ff
py : fix extra space in convert_hf_to_gguf.py (#8407)
laik
committed
1 year ago
Verified
8f0fad42
Server: Enable setting default sampling parameters via command-line (#8402)
HanClinto
committed
1 year ago
Verified
a59f8fdc
Update README.md to fix broken link to docs (#8399)
andysalerno
committed
1 year ago
Verified
fd560fe6
Deprecation warning to assist with migration to new binary names (#8283)
HanClinto
committed
1 year ago
Verified
e500d613
make/cmake: LLAMA_NO_CCACHE -> GGML_NO_CCACHE (#8392)
JohannesGaessler
committed
1 year ago
Verified
a03e8dd9
sycl : Reenabled mmvq path for the SYCL Nvidia Backend (#8372)
Alberto Cabrera Pérez
committed
1 year ago
Verified
5b0b8d8c
cmake : allow external ggml (#8370)
iboB
committed
1 year ago
Verified
9925ca40
readme : fix typo [no ci] (#8389)
daghanerdonmez
committed
1 year ago
Verified
9beb2dda
gguf-py : do not use internal numpy types (#7472)
compilade
committed
1 year ago
Verified
7d0e23d7
flake.lock: Update (#8342)
ggerganov
committed
1 year ago
Verified
7fdb6f73
labeler : updated sycl to match docs and code refactor (#8373)
Alberto Cabrera Pérez
committed
1 year ago
Verified
a130ecce
readme : fix web link error [no ci] (#8347)
b4b4o
committed
1 year ago
Verified
c4dd11d1
sycl : fix powf call in device code (#8368)
Alberto Cabrera Pérez
committed
1 year ago
Verified
2ec846d5
scripts : fix sync for sycl
ggerganov
committed
1 year ago
Verified
3f2d538b
sync : ggml
ggerganov
committed
1 year ago
2ee44c9a
tests : fix whitespace (#0)
ggerganov
committed
1 year ago
6847d54c
feat: cuda implementation for `ggml_conv_transpose_1d` (ggml/854)
balisujohn
committed
1 year ago
fde13b3b
common : preallocate sampling token data vector (#8363)
kevmo314
committed
1 year ago
Verified
470939d4
infill : assert prefix/suffix tokens + remove old space logic (#8351)
ggerganov
committed
1 year ago
Verified
6f0dbf6a
common : avoid unnecessary logits fetch (#8358)
kevmo314
committed
1 year ago
Verified
ffd00797
Newer
Older