llama.cpp
Tool calls support improvements
#11
Closed
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
144
Changes
View On
GitHub
Tool calls support improvements
#11
mario7421
wants to merge 144 commits into
ngxson:xsn/tool_call
from
mario7421:tool-call-improvements
Correct typo run_llama2.sh > run-llama2.sh (#9149)
cddae488
llama : fix llama_split_mode enum values in main_gpu document (#9057)
0ab30f8d
llama : fix typo in xcda_array_view comment [no ci] (#9132)
49271efb
sgemm : improved Q4_0 and Q8_0 performance via 4xN and Mx4 gemm (#8908)
ea5d7478
nix: fix CUDA build - replace deprecated autoAddOpenGLRunpathHook
a47667cf
llama : support RWKV v6 models (#8980)
8f1d81a0
llama : minor style
c6d4cb46
build(nix): Package gguf-py (#5664)
9c1ba557
llama-cli : remove duplicated log message (#9275)
b60074f1
server : refactor multitask handling (#9274)
6e7d133a
ggml : add pthread includes on FreeBSD (#9258)
f771d064
docker : fix missing binaries in full-cuda image (#9278)
048de848
src: make tail invalid when kv cell is intersection for mamba (#9249)
f1485161
server : test script : add timeout for all requests (#9282)
48baa61e
readme : refactor API section + remove old hot topics
b69a480a
llama-bench : add JSONL (NDJSON) output mode (#9288)
8962422b
flake.lock: Update (#9261)
7605ae7d
readme : rename result_format to response_format (#9300)
9379d3cc
rpc : make RPC servers come first in the device list (#9296)
82e3b03c
Fix broken links in docker.md (#9306)
c8671ae2
[SYCL] Fix DMMV dequantization (#9279)
5910ea94
ggml : AVX2 support for Q4_0_8_8 (#8713)
581c3051
llama-bench : fix NUL terminators in CPU name (#9313)
bdf314f3
cuda : fix defrag with quantized KV (#9319)
4db04784
CMake fix: host for msvc compiler can only be x86 or x64 (#8624)
1031771f
Update build.yml (#9184)
32b2ec88
ggml-quants : ternary packing for TriLMs and BitNet b1.58 (#8151)
9bc6db28
Improve Vulkan shader build system (#9239)
8ebe8dde
server : fix missing lock (#9334)
4a1411b4
ggml : fix build break for the vulkan-debug (#9265)
409dc4f8
batched-bench : add `--output-format jsonl` option (#9293)
815b1fb2
llama-bench : log benchmark progress (#9287)
134bc38e
server : simplify state machine for slot (#9283)
9b2c24c0
ci : disable rocm image creation (#9340)
6c89eb0b
ggml : fix missing `cpu_set_t` on emscripten (#9336)
947538ac
llama : refactor sampling v2 (#9294)
df270ef7
ggml : always check bounds on get_rows operations (#9354)
e32d0816
common : refactor arg parser (#9308)
1b9ae518
llamafile : disable sgemm for batch-size 1 (#9330)
e536426d
llama : sanitize invalid tokens (#9357)
faf69d42
llama : fix empty ring buffer push (#9358)
f12295b8
llama.android : fix build (#9350)
a5b5d9a1
llama : set attrs of mislabelled EOT/EOM tokens (#9348)
fbb7fcff
ggml : fix cont with transposed tensors when one dimension is 1 (ggml…
efe6a83e
cuda : mark BF16 CONT as unsupported
51d964a4
cann : add Ascend NPU support (whisper/2336)
d2d3200b
cann : fix doxy (ggml/0)
ba1cf846
ggml: fix ggml_graph_cpy undefined behavior (ggml/943)
dbbebcab
tests: add gradient tests for all backends (ggml/932)
202084d3
vulkan: correctly report support for OP_CONT (ggml/946)
9cb92608
vulkan: add dryrun support to sin and cos ops (ggml/947)
406c1a32
scripts : option to increase git patch context
60a3107c
sync : ggml
385decbd
metal : update support condition for im2col + fix warning (#0)
a8768614
imatrix : fix arg parser for imatrix (#9366)
00b02bb2
llama : sanitize tokens in the upper bound (#9359)
eae59718
[SYCL] add check malloc result on device (#9346)
2a358fb0
llama : refactor samplers internal implementation (#9370)
19f4a7b2
common : restore --n-gpu-layers (#9371)
a249843d
common : bring back missing args, add env var duplication check (#9375)
3f7ccfd6
cuda : fix FA Q src index (1 -> 0) (#9374)
e079bffb
Overlap cmdbuffer creation and cmdbuffer execution in Vulkan backend …
daa9623a
Arm AArch64: Documentation updates (#9321)
b2e89a32
rpc : update README [no ci] (#9320)
54f376d0
readme : add LLMUnity to UI projects (#9381)
5ed08757
CUDA: fix variable name conflict for Windows build (#9382)
8e6e2fbe
readme : update hot topics
38ca6f64
llama : minor sampling refactor (2) (#9386)
5fb5e248
ggml : vector length agnostic SVE support (#9290)
5fac4d57
rpc : fix segfault with nkvo (#9389)
293bebe0
common : move arg parser code to `arg.cpp` (#9388)
bfe76d4a
make : do not run llama-gen-docs when building (#9399)
fb3f2498
RWKV v6: Add time_mix_decay_w1/w2 in quant exclusion list (#9387)
0b4ac757
llama : update llm_build_copy_mask_state comment [no ci] (#9385)
83008b7c
metal : fix compile warning with GGML_METAL_NDEBUG (#0)
00ba2ff7
llama : move random seed generation to the samplers (#9398)
49006c67
enable --special arg for llama-server (#9419)
8d300bd3
arg : bring back missing ifdef (#9411)
6cd4e034
flake.lock: Update (#9360)
cb9c933e
sycl : update support conditions (#9394)
51b60386
musa: remove Clang builtins mapping (#9421)
b34e0234
batched-bench : remove unused code (#9305)
d2b496bf
CUDA: fix --split-mode row race condition (#9413)
5af118ef
feat: Implements retrying logic for downloading models using --model-…
67155ab7
files : remove accidentally added `lora_test` submodule (#9430)
5bb2c5db
llava : correct args for minicpmv-cli (#9429)
0996c559
py : support converting local models (#7547)
8db003a1
llama : skip token bounds check when evaluating embeddings (#9437)
1b280614
Add Jais to list of supported models (#9439)
449ccfb6
cann: Fix error when running a non-exist op (#9424)
df4b7945
enhance run script to be easy to change the parameters (#9448)
c9c8575a
ggml : hide ggml_object, ggml_cgraph, ggml_hash_set (#9408)
d6a04f87
riscv : modify Makefile and add a RISCV_VECT to print log info (#9442)
2b00fa79
py : add special tokens in hf_converter for RWKV v6 (#9428)
39f852f4
cmake : fixed the order of linking libraries for llama-quantize (#9450)
ff76e185
ci : bump actions/checkout to v4 (#9377)
3c26a164
py : add Phi-1.5/Phi-2 tokenizer (#9361)
c837981b
ci : update HIP SDK to 24.Q3 (ROCm 6.1) (#9329)
4dc4f5f1
cmake : fix for builds without `GGML_CDEF_PUBLIC` (#9338)
2a825116
lora : raise error if lm_head is ignored (#9103)
d4c3c10f
llava : fix the script error in MobileVLM README (#9054)
e6657443
cann: Add host buffer type for Ascend NPU (#9406)
e6b7801b
server : Add option to return token pieces in /tokenize endpoint (#9108)
78203641
feat: remove a sampler from a chain (#9445)
bd35cb0a
llama : llama_perf + option to disable timings during decode (#9355)
0abc6a2c
server : add loading html page while model is loading (#9468)
feff4aa8
llama : make cell_id const in inp_s_mask block (#9470)
befaf119
cmake : use list(APPEND ...) instead of set() + dedup linker (#9463)
1f4111e5
server: add data: [DONE] to /chat/completions stream response (#9459)
dcdcee3a
ggml : ggml_type_name return "NONE" for invalid values (#9458)
822b6322
cmake : try to fix sycl+intel build (#9487)
7596487b
readme : update tools list (#9475)
d6b37c88
py : add "LLaMAForCausalLM" conversion support (#9485)
3c7989fd
cmake : correct order of sycl flags (#9497)
6988da94
gguf-split : add basic checks (#9499)
e6deac31
common : reimplement logging (#9418)
6262d13e
flake.lock: Update (#9488)
90a2fff0
metal : handle zero-sized allocs (#9466)
c4965a64
main : option to disable context shift (#9484)
441b72b9
llama : support MiniCPM3 (#9322)
95ca8516
llama : support OLMoE (#9462)
0aadac10
ggml : IQ4_NL sgemm + Q4_0 AVX optimization (#9422)
5c3d0f18
cmake : do not hide GGML options + rename option (#9465)
19514d63
convert : identify missing model files (#9397)
d54c21df
ggml : link MATH_LIBRARY not by its full path (#9339)
a6a3a5c5
llama : rename n_embed to n_embd in rwkv6_time_mix (#9504)
acb2c32c
ggml : move common CPU backend impl to new header (#9509)
23e0d70b
llama : add llama_n_head() (#9512)
37f3a381
llama : support IBM Granite architecture (#9412)
0d2ec438
unicode : add <algorithm> (#9508)
503147a9
threadpool : skip polling for unused threads (#9461)
02266138
llama : fix n_vocab init for 'no_vocab' case (#9511)
8344ef58
arg : add env variable for parallel (#9513)
8b836ae7
llama-bench: correct argument parsing error message (#9524)
7be099fa
[SYCL]set context default value to avoid memory issue, update guide (…
faf67b3d
server : fix OpenSSL build (remove obsolete `LOG_INFO`) (#9529)
f799155a
server : match OAI structured output response (#9527)
8a308354
llama : use reserve/emplace_back in sampler_sample (#9534)
6443ddd9
scripts : verify py deps at the start of compare (#9520)
0d2f22e4
ggml : fix n_threads_cur initialization with one thread (#9538)
64c6af31
imatrix : disable prompt escape by default (#9543)
eca0fab4
server : clean-up completed tasks from waiting list (#9531)
6026da52
Tool calls support improvements (support null content in messages, ha…
d3830ade
github-actions
added
examples
github-actions
added
server
Merge branch 'master' into tool-call-improvements
b67b817b
github-actions
added
documentation
github-actions
added
ggml
github-actions
added
python
github-actions
added
Kompute
github-actions
added
SYCL
github-actions
added
Nvidia GPU
github-actions
added
Vulkan
github-actions
added
testing
github-actions
added
build
github-actions
added
devops
github-actions
added
script
github-actions
added
android
github-actions
added
nix
mario7421
closed this
305 days ago
Login to write a write a comment.
Login via GitHub
Reviewers
No reviews
Assignees
No one assigned
Labels
documentation
examples
ggml
python
server
Kompute
SYCL
Nvidia GPU
Vulkan
testing
build
devops
script
android
nix
Milestone
No milestone
Login to write a write a comment.
Login via GitHub