PR #11 Tool calls support improvements

Correct typo run_llama2.sh > run-llama2.sh (#9149)

cddae488

llama : fix llama_split_mode enum values in main_gpu document (#9057)

0ab30f8d

llama : fix typo in xcda_array_view comment [no ci] (#9132)

49271efb

sgemm : improved Q4_0 and Q8_0 performance via 4xN and Mx4 gemm (#8908)

ea5d7478

nix: fix CUDA build - replace deprecated autoAddOpenGLRunpathHook

a47667cf

llama : support RWKV v6 models (#8980)

8f1d81a0

llama : minor style

c6d4cb46

build(nix): Package gguf-py (#5664)

9c1ba557

llama-cli : remove duplicated log message (#9275)

b60074f1

server : refactor multitask handling (#9274)

6e7d133a

ggml : add pthread includes on FreeBSD (#9258)

f771d064

docker : fix missing binaries in full-cuda image (#9278)

048de848

src: make tail invalid when kv cell is intersection for mamba (#9249)

f1485161

server : test script : add timeout for all requests (#9282)

48baa61e

readme : refactor API section + remove old hot topics

b69a480a

llama-bench : add JSONL (NDJSON) output mode (#9288)

8962422b

flake.lock: Update (#9261)

7605ae7d

readme : rename result_format to response_format (#9300)

9379d3cc

rpc : make RPC servers come first in the device list (#9296)

82e3b03c

Fix broken links in docker.md (#9306)

c8671ae2

[SYCL] Fix DMMV dequantization (#9279)

5910ea94

ggml : AVX2 support for Q4_0_8_8 (#8713)

581c3051

llama-bench : fix NUL terminators in CPU name (#9313)

bdf314f3

cuda : fix defrag with quantized KV (#9319)

4db04784

CMake fix: host for msvc compiler can only be x86 or x64 (#8624)

1031771f

Update build.yml (#9184)

32b2ec88

ggml-quants : ternary packing for TriLMs and BitNet b1.58 (#8151)

9bc6db28

Improve Vulkan shader build system (#9239)

8ebe8dde

server : fix missing lock (#9334)

4a1411b4

ggml : fix build break for the vulkan-debug (#9265)

409dc4f8

batched-bench : add `--output-format jsonl` option (#9293)

815b1fb2

llama-bench : log benchmark progress (#9287)

134bc38e

server : simplify state machine for slot (#9283)

9b2c24c0

ci : disable rocm image creation (#9340)

6c89eb0b

ggml : fix missing `cpu_set_t` on emscripten (#9336)

947538ac

llama : refactor sampling v2 (#9294)

df270ef7

ggml : always check bounds on get_rows operations (#9354)

e32d0816

common : refactor arg parser (#9308)

1b9ae518

llamafile : disable sgemm for batch-size 1 (#9330)

e536426d

llama : sanitize invalid tokens (#9357)

faf69d42

llama : fix empty ring buffer push (#9358)

f12295b8

llama.android : fix build (#9350)

a5b5d9a1

llama : set attrs of mislabelled EOT/EOM tokens (#9348)

fbb7fcff

ggml : fix cont with transposed tensors when one dimension is 1 (ggml…

efe6a83e

cuda : mark BF16 CONT as unsupported

51d964a4

cann : add Ascend NPU support (whisper/2336)

d2d3200b

cann : fix doxy (ggml/0)

ba1cf846

ggml: fix ggml_graph_cpy undefined behavior (ggml/943)

dbbebcab

tests: add gradient tests for all backends (ggml/932)

202084d3

vulkan: correctly report support for OP_CONT (ggml/946)

9cb92608

vulkan: add dryrun support to sin and cos ops (ggml/947)

406c1a32

scripts : option to increase git patch context

60a3107c

sync : ggml

385decbd

metal : update support condition for im2col + fix warning (#0)

a8768614

imatrix : fix arg parser for imatrix (#9366)

00b02bb2

llama : sanitize tokens in the upper bound (#9359)

eae59718

[SYCL] add check malloc result on device (#9346)

2a358fb0

llama : refactor samplers internal implementation (#9370)

19f4a7b2

common : restore --n-gpu-layers (#9371)

a249843d

common : bring back missing args, add env var duplication check (#9375)

3f7ccfd6

cuda : fix FA Q src index (1 -> 0) (#9374)

e079bffb

Overlap cmdbuffer creation and cmdbuffer execution in Vulkan backend …

daa9623a

Arm AArch64: Documentation updates (#9321)

b2e89a32

rpc : update README [no ci] (#9320)

54f376d0

readme : add LLMUnity to UI projects (#9381)

5ed08757

CUDA: fix variable name conflict for Windows build (#9382)

8e6e2fbe

readme : update hot topics

38ca6f64

llama : minor sampling refactor (2) (#9386)

5fb5e248

ggml : vector length agnostic SVE support (#9290)

5fac4d57

rpc : fix segfault with nkvo (#9389)

293bebe0

common : move arg parser code to `arg.cpp` (#9388)

bfe76d4a

make : do not run llama-gen-docs when building (#9399)

fb3f2498

RWKV v6: Add time_mix_decay_w1/w2 in quant exclusion list (#9387)

0b4ac757

llama : update llm_build_copy_mask_state comment [no ci] (#9385)

83008b7c

metal : fix compile warning with GGML_METAL_NDEBUG (#0)

00ba2ff7

llama : move random seed generation to the samplers (#9398)

49006c67

enable --special arg for llama-server (#9419)

8d300bd3

arg : bring back missing ifdef (#9411)

6cd4e034

flake.lock: Update (#9360)

cb9c933e

sycl : update support conditions (#9394)

51b60386

musa: remove Clang builtins mapping (#9421)

b34e0234

batched-bench : remove unused code (#9305)

d2b496bf

CUDA: fix --split-mode row race condition (#9413)

5af118ef

feat: Implements retrying logic for downloading models using --model-…

67155ab7

files : remove accidentally added `lora_test` submodule (#9430)

5bb2c5db

llava : correct args for minicpmv-cli (#9429)

0996c559

py : support converting local models (#7547)

8db003a1

llama : skip token bounds check when evaluating embeddings (#9437)

1b280614

Add Jais to list of supported models (#9439)

449ccfb6

cann: Fix error when running a non-exist op (#9424)

df4b7945

enhance run script to be easy to change the parameters (#9448)

c9c8575a

ggml : hide ggml_object, ggml_cgraph, ggml_hash_set (#9408)

d6a04f87

riscv : modify Makefile and add a RISCV_VECT to print log info (#9442)

2b00fa79

py : add special tokens in hf_converter for RWKV v6 (#9428)

39f852f4

cmake : fixed the order of linking libraries for llama-quantize (#9450)

ff76e185

ci : bump actions/checkout to v4 (#9377)

3c26a164

py : add Phi-1.5/Phi-2 tokenizer (#9361)

c837981b

ci : update HIP SDK to 24.Q3 (ROCm 6.1) (#9329)

4dc4f5f1

cmake : fix for builds without `GGML_CDEF_PUBLIC` (#9338)

2a825116

lora : raise error if lm_head is ignored (#9103)

d4c3c10f

llava : fix the script error in MobileVLM README (#9054)

e6657443

cann: Add host buffer type for Ascend NPU (#9406)

e6b7801b

server : Add option to return token pieces in /tokenize endpoint (#9108)

78203641

feat: remove a sampler from a chain (#9445)

bd35cb0a

llama : llama_perf + option to disable timings during decode (#9355)

0abc6a2c

server : add loading html page while model is loading (#9468)

feff4aa8

llama : make cell_id const in inp_s_mask block (#9470)

befaf119

cmake : use list(APPEND ...) instead of set() + dedup linker (#9463)

1f4111e5

server: add data: [DONE] to /chat/completions stream response (#9459)

dcdcee3a

ggml : ggml_type_name return "NONE" for invalid values (#9458)

822b6322

cmake : try to fix sycl+intel build (#9487)

7596487b

readme : update tools list (#9475)

d6b37c88

py : add "LLaMAForCausalLM" conversion support (#9485)

3c7989fd

cmake : correct order of sycl flags (#9497)

6988da94

gguf-split : add basic checks (#9499)

e6deac31

common : reimplement logging (#9418)

6262d13e

flake.lock: Update (#9488)

90a2fff0

metal : handle zero-sized allocs (#9466)

c4965a64

main : option to disable context shift (#9484)

441b72b9

llama : support MiniCPM3 (#9322)

95ca8516

llama : support OLMoE (#9462)

0aadac10

ggml : IQ4_NL sgemm + Q4_0 AVX optimization (#9422)

5c3d0f18

cmake : do not hide GGML options + rename option (#9465)

19514d63

convert : identify missing model files (#9397)

d54c21df

ggml : link MATH_LIBRARY not by its full path (#9339)

a6a3a5c5

llama : rename n_embed to n_embd in rwkv6_time_mix (#9504)

acb2c32c

ggml : move common CPU backend impl to new header (#9509)

23e0d70b

llama : add llama_n_head() (#9512)

37f3a381

llama : support IBM Granite architecture (#9412)

0d2ec438

unicode : add <algorithm> (#9508)

503147a9

threadpool : skip polling for unused threads (#9461)

02266138

llama : fix n_vocab init for 'no_vocab' case (#9511)

8344ef58

arg : add env variable for parallel (#9513)

8b836ae7

llama-bench: correct argument parsing error message (#9524)

7be099fa

[SYCL]set context default value to avoid memory issue, update guide (…

faf67b3d

server : fix OpenSSL build (remove obsolete `LOG_INFO`) (#9529)

f799155a

server : match OAI structured output response (#9527)

8a308354

llama : use reserve/emplace_back in sampler_sample (#9534)

6443ddd9

scripts : verify py deps at the start of compare (#9520)

0d2f22e4

ggml : fix n_threads_cur initialization with one thread (#9538)

64c6af31

imatrix : disable prompt escape by default (#9543)

eca0fab4

server : clean-up completed tasks from waiting list (#9531)

6026da52

Tool calls support improvements (support null content in messages, ha…

d3830ade

github-actions added examples

github-actions added server

Merge branch 'master' into tool-call-improvements

b67b817b

github-actions added documentation

github-actions added ggml

github-actions added python

github-actions added Kompute

github-actions added SYCL

github-actions added Nvidia GPU

github-actions added Vulkan

github-actions added testing

github-actions added build

github-actions added devops

github-actions added script

github-actions added android

github-actions added nix

mario7421 closed this 305 days ago

llama.cpp
Tool calls support improvements
#11

Closed

Tool calls support improvements #11

llama.cpp Tool calls support improvements #11 Closed

Tool calls support improvements #11

llama.cpp
Tool calls support improvements
#11

Closed