llama.cpp
Tool calls support improvements
#11
Closed

Tool calls support improvements #11

mario7421 wants to merge 144 commits into ngxson:xsn/tool_call from mario7421:tool-call-improvements
mario7421
MakeDecisionWorth Correct typo run_llama2.sh > run-llama2.sh (#9149)
cddae488
kou llama : fix llama_split_mode enum values in main_gpu document (#9057)
0ab30f8d
danbev llama : fix typo in xcda_array_view comment [no ci] (#9132)
49271efb
Srihari-mcw sgemm : improved Q4_0 and Q8_0 performance via 4xN and Mx4 gemm (#8908)
ea5d7478
enolan nix: fix CUDA build - replace deprecated autoAddOpenGLRunpathHook
a47667cf
MollySophia llama : support RWKV v6 models (#8980)
8f1d81a0
ggerganov llama : minor style
c6d4cb46
ditsuke build(nix): Package gguf-py (#5664)
9c1ba557
nbcsm llama-cli : remove duplicated log message (#9275)
b60074f1
ngxson server : refactor multitask handling (#9274)
6e7d133a
yurivict ggml : add pthread includes on FreeBSD (#9258)
f771d064
slaren docker : fix missing binaries in full-cuda image (#9278)
048de848
kylo5aby src: make tail invalid when kv cell is intersection for mamba (#9249)
f1485161
ngxson server : test script : add timeout for all requests (#9282)
48baa61e
ggerganov readme : refactor API section + remove old hot topics
b69a480a
akx llama-bench : add JSONL (NDJSON) output mode (#9288)
8962422b
ggerganov flake.lock: Update (#9261)
7605ae7d
iscy readme : rename result_format to response_format (#9300)
9379d3cc
rgerganov rpc : make RPC servers come first in the device list (#9296)
82e3b03c
carlory Fix broken links in docker.md (#9306)
c8671ae2
OuadiElfarouki [SYCL] Fix DMMV dequantization (#9279)
5910ea94
Srihari-mcw ggml : AVX2 support for Q4_0_8_8 (#8713)
581c3051
slaren llama-bench : fix NUL terminators in CPU name (#9313)
bdf314f3
slaren cuda : fix defrag with quantized KV (#9319)
4db04784
Xarbirus CMake fix: host for msvc compiler can only be x86 or x64 (#8624)
1031771f
awatuna Update build.yml (#9184)
32b2ec88
compilade ggml-quants : ternary packing for TriLMs and BitNet b1.58 (#8151)
9bc6db28
mtavenrath Improve Vulkan shader build system (#9239)
8ebe8dde
ngxson server : fix missing lock (#9334)
4a1411b4
cyzero-kim ggml : fix build break for the vulkan-debug (#9265)
409dc4f8
akx batched-bench : add `--output-format jsonl` option (#9293)
815b1fb2
akx llama-bench : log benchmark progress (#9287)
134bc38e
ngxson server : simplify state machine for slot (#9283)
9b2c24c0
slaren ci : disable rocm image creation (#9340)
6c89eb0b
ngxson ggml : fix missing `cpu_set_t` on emscripten (#9336)
947538ac
ggerganov llama : refactor sampling v2 (#9294)
df270ef7
slaren ggml : always check bounds on get_rows operations (#9354)
e32d0816
ngxson common : refactor arg parser (#9308)
1b9ae518
netrunnereve llamafile : disable sgemm for batch-size 1 (#9330)
e536426d
ggerganov llama : sanitize invalid tokens (#9357)
faf69d42
ggerganov llama : fix empty ring buffer push (#9358)
f12295b8
ggerganov llama.android : fix build (#9350)
a5b5d9a1
bakkot llama : set attrs of mislabelled EOT/EOM tokens (#9348)
fbb7fcff
smeso ggml : fix cont with transposed tensors when one dimension is 1 (ggml…
efe6a83e
ggerganov cuda : mark BF16 CONT as unsupported
51d964a4
MengqingCao cann : add Ascend NPU support (whisper/2336)
d2d3200b
ggerganov cann : fix doxy (ggml/0)
ba1cf846
JohannesGaessler ggml: fix ggml_graph_cpy undefined behavior (ggml/943)
dbbebcab
JohannesGaessler tests: add gradient tests for all backends (ggml/932)
202084d3
smeso vulkan: correctly report support for OP_CONT (ggml/946)
9cb92608
smeso vulkan: add dryrun support to sin and cos ops (ggml/947)
406c1a32
ggerganov scripts : option to increase git patch context
60a3107c
ggerganov sync : ggml
385decbd
ggerganov metal : update support condition for im2col + fix warning (#0)
a8768614
ngxson imatrix : fix arg parser for imatrix (#9366)
00b02bb2
slaren llama : sanitize tokens in the upper bound (#9359)
eae59718
NeoZhangJianyu [SYCL] add check malloc result on device (#9346)
2a358fb0
slaren llama : refactor samplers internal implementation (#9370)
19f4a7b2
slaren common : restore --n-gpu-layers (#9371)
a249843d
ngxson common : bring back missing args, add env var duplication check (#9375)
3f7ccfd6
ggerganov cuda : fix FA Q src index (1 -> 0) (#9374)
e079bffb
mtavenrath Overlap cmdbuffer creation and cmdbuffer execution in Vulkan backend …
daa9623a
eddnjjn Arm AArch64: Documentation updates (#9321)
b2e89a32
rgerganov rpc : update README [no ci] (#9320)
54f376d0
amakropoulos readme : add LLMUnity to UI projects (#9381)
5ed08757
JohannesGaessler CUDA: fix variable name conflict for Windows build (#9382)
8e6e2fbe
ggerganov readme : update hot topics
38ca6f64
slaren llama : minor sampling refactor (2) (#9386)
5fb5e248
Vithulep ggml : vector length agnostic SVE support (#9290)
5fac4d57
rgerganov rpc : fix segfault with nkvo (#9389)
293bebe0
ngxson common : move arg parser code to `arg.cpp` (#9388)
bfe76d4a
slaren make : do not run llama-gen-docs when building (#9399)
fb3f2498
MollySophia RWKV v6: Add time_mix_decay_w1/w2 in quant exclusion list (#9387)
0b4ac757
danbev llama : update llm_build_copy_mask_state comment [no ci] (#9385)
83008b7c
ggerganov metal : fix compile warning with GGML_METAL_NDEBUG (#0)
00ba2ff7
slaren llama : move random seed generation to the samplers (#9398)
49006c67
matteoserva enable --special arg for llama-server (#9419)
8d300bd3
ngxson arg : bring back missing ifdef (#9411)
6cd4e034
ggerganov flake.lock: Update (#9360)
cb9c933e
sycl : update support conditions (#9394)
51b60386
yeahdongcn musa: remove Clang builtins mapping (#9421)
b34e0234
ggerganov batched-bench : remove unused code (#9305)
d2b496bf
JohannesGaessler CUDA: fix --split-mode row race condition (#9413)
5af118ef
farbodbj feat: Implements retrying logic for downloading models using --model-…
67155ab7
ngxson files : remove accidentally added `lora_test` submodule (#9430)
5bb2c5db
ngxson llava : correct args for minicpmv-cli (#9429)
0996c559
EvilFreelancer py : support converting local models (#7547)
8db003a1
slaren llama : skip token bounds check when evaluating embeddings (#9437)
1b280614
fmz Add Jais to list of supported models (#9439)
449ccfb6
bachelor-dou cann: Fix error when running a non-exist op (#9424)
df4b7945
NeoZhangJianyu enhance run script to be easy to change the parameters (#9448)
c9c8575a
ggerganov ggml : hide ggml_object, ggml_cgraph, ggml_hash_set (#9408)
d6a04f87
Tameem-10xE riscv : modify Makefile and add a RISCV_VECT to print log info (#9442)
2b00fa79
MollySophia py : add special tokens in hf_converter for RWKV v6 (#9428)
39f852f4
Xarbirus cmake : fixed the order of linking libraries for llama-quantize (#9450)
ff76e185
trivikr ci : bump actions/checkout to v4 (#9377)
3c26a164
daminho py : add Phi-1.5/Phi-2 tokenizer (#9361)
c837981b
no1wudi ci : update HIP SDK to 24.Q3 (ROCm 6.1) (#9329)
4dc4f5f1
Xarbirus cmake : fix for builds without `GGML_CDEF_PUBLIC` (#9338)
2a825116
ngxson lora : raise error if lm_head is ignored (#9103)
d4c3c10f
fengerhu1 llava : fix the script error in MobileVLM README (#9054)
e6657443
bachelor-dou cann: Add host buffer type for Ascend NPU (#9406)
e6b7801b
mathijshenquet server : Add option to return token pieces in /tokenize endpoint (#9108)
78203641
giladgd feat: remove a sampler from a chain (#9445)
bd35cb0a
ggerganov llama : llama_perf + option to disable timings during decode (#9355)
0abc6a2c
ngxson server : add loading html page while model is loading (#9468)
feff4aa8
danbev llama : make cell_id const in inp_s_mask block (#9470)
befaf119
ggerganov cmake : use list(APPEND ...) instead of set() + dedup linker (#9463)
1f4111e5
VoidIsVoid server: add data: [DONE] to /chat/completions stream response (#9459)
dcdcee3a
ykhrustalev ggml : ggml_type_name return "NONE" for invalid values (#9458)
822b6322
Xarbirus cmake : try to fix sycl+intel build (#9487)
7596487b
OLSecret readme : update tools list (#9475)
d6b37c88
csabakecskemeti py : add "LLaMAForCausalLM" conversion support (#9485)
3c7989fd
Xarbirus cmake : correct order of sycl flags (#9497)
6988da94
slaren gguf-split : add basic checks (#9499)
e6deac31
ggerganov common : reimplement logging (#9418)
6262d13e
ggerganov flake.lock: Update (#9488)
90a2fff0
ggerganov metal : handle zero-sized allocs (#9466)
c4965a64
VJHack main : option to disable context shift (#9484)
441b72b9
CarryFun llama : support MiniCPM3 (#9322)
95ca8516
2015aroras llama : support OLMoE (#9462)
0aadac10
netrunnereve ggml : IQ4_NL sgemm + Q4_0 AVX optimization (#9422)
5c3d0f18
ggerganov cmake : do not hide GGML options + rename option (#9465)
19514d63
compilade convert : identify missing model files (#9397)
d54c21df
Xarbirus ggml : link MATH_LIBRARY not by its full path (#9339)
a6a3a5c5
danbev llama : rename n_embed to n_embd in rwkv6_time_mix (#9504)
acb2c32c
slaren ggml : move common CPU backend impl to new header (#9509)
23e0d70b
Xarbirus llama : add llama_n_head() (#9512)
37f3a381
gabe-l-hart llama : support IBM Granite architecture (#9412)
0d2ec438
ykhrustalev unicode : add <algorithm> (#9508)
503147a9
max-krasnyansky threadpool : skip polling for unused threads (#9461)
02266138
Xarbirus llama : fix n_vocab init for 'no_vocab' case (#9511)
8344ef58
bertwagner arg : add env variable for parallel (#9513)
8b836ae7
Xarbirus llama-bench: correct argument parsing error message (#9524)
7be099fa
NeoZhangJianyu [SYCL]set context default value to avoid memory issue, update guide (…
faf67b3d
EZForever server : fix OpenSSL build (remove obsolete `LOG_INFO`) (#9529)
f799155a
VJHack server : match OAI structured output response (#9527)
8a308354
danbev llama : use reserve/emplace_back in sampler_sample (#9534)
6443ddd9
ggerganov scripts : verify py deps at the start of compare (#9520)
0d2f22e4
slaren ggml : fix n_threads_cur initialization with one thread (#9538)
64c6af31
CISC imatrix : disable prompt escape by default (#9543)
eca0fab4
ggerganov server : clean-up completed tasks from waiting list (#9531)
6026da52
mario7421 Tool calls support improvements (support null content in messages, ha…
d3830ade
github-actions github-actions added examples
github-actions github-actions added server
mario7421
mario7421
mario7421 Merge branch 'master' into tool-call-improvements
b67b817b
github-actions github-actions added documentation
github-actions github-actions added ggml
github-actions github-actions added python
github-actions github-actions added Kompute
github-actions github-actions added SYCL
github-actions github-actions added Nvidia GPU
github-actions github-actions added Vulkan
github-actions github-actions added testing
github-actions github-actions added build
github-actions github-actions added devops
github-actions github-actions added script
github-actions github-actions added android
github-actions github-actions added nix
mario7421 mario7421 closed this 305 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
No reviews
Assignees
No one assigned
Labels
Milestone