llama.cpp
Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars
#9639
Merged

Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars #9639

ochafik merged 375 commits into ggml-org:master from ochafik:tool-call
ochafik
github-actions github-actions added testing
github-actions github-actions added examples
github-actions github-actions added python
github-actions github-actions added server
ochafik ochafik changed the title Tool call support (Llama 3.1, Functionary 3.2, Hermes 2 Pro) & Minimalist Jinja template engine Tool call support (Llama 3.1, Functionary v3, Hermes 2 Pro) & Minimalist Jinja template engine 1 year ago
ochafik ochafik changed the title Tool call support (Llama 3.1, Functionary v3, Hermes 2 Pro) & Minimalist Jinja template engine Tool call support (Llama 3.1, Functionary v3, Hermes 2 Pro) w/ lazy grammars & minimalist Jinja engine 1 year ago
ochafik
ochafik ochafik changed the title Tool call support (Llama 3.1, Functionary v3, Hermes 2 Pro) w/ lazy grammars & minimalist Jinja engine Tool call support (Llama 3.x, Functionary v3, Hermes 2 Pro) w/ lazy grammars & minimalist Jinja engine 1 year ago
rujialiu
ochafik
rujialiu
github-actions github-actions added script
Maximilian-Winter
ochafik
Maximilian-Winter
ochafik ochafik changed the title Tool call support (Llama 3.x, Functionary v3, Hermes 2 Pro) w/ lazy grammars & minimalist Jinja engine Tool call support (Llama 3.x, Functionary v3, Hermes 2 Pro, Mistral Nemo, generic) w/ lazy grammars & minimalist Jinja engine 1 year ago
nits
ec9f3b10
`tool-call`: slow tool call integration tests
9a86ea79
space nits
c88095e3
`tool_call`: test no tool call on a real model + rename scenarios
7fde6d00
`tool-call`: script to prefetch models used in server tests
dd6d0241
Update tool_call.feature
168add7e
`tool-call`: add tests: tool_call=none, parallel_tool_calls=true
ec547e41
`tool-call`: remove duplicate script to fetch templates
b51c71c7
`agent`: simplify syntax (default tools to local w/ default port)
74d71a67
`tool-call`: use Q4_K_M models
b825440c
`tool-call`: update scripts/fetch_server_test_models.py
aefac1e5
`tool-call`: test Hermes-3-Llama-3.1-8B
64287a32
`tool-call`: use functionary-small-v3.2-Q8_0.gguf in test (Q4_K_M too…
fa4c1119
`tool-call`: force printing of lazy grammar trigger tokens to regular…
773ff91b
nits
92c384a5
ggerganov
ggerganov commented on 2024-10-30
ggerganov
ggerganov commented on 2024-10-30
ggerganov
ggerganov commented on 2024-10-30
ggerganov
ggerganov commented on 2024-10-30
`tool-call`: support tool_use variant in llama_chat_template_from_mod…
3ebdb2b8
`tool-call`: fix missing initializer errors
35ac17f3
`tool-call`: when slow server tests fail, hint to run `python scripts…
5227321d
`tool-calls`: test Qwen2.5-7B-Instruct-Q4_K_M.gguf
e4d54496
Merge remote-tracking branch 'origin/master' into tool-call
61655b9c
Update llama-sampling.cpp
be9de3ed
`tool-call`: greedy sampling in server tests + tweak prompt
542853b3
`tool-call`: nemo tweak (accept raw sql again)
7d9c90f4
Update tool_call.feature
e8d9d711
`tool-call`: behaviour-based detection of template features
c395d480
`tool-call`: code_interpreter & system + tool call support for all ji…
f5b78255
`tool-call`: don't use -fa w/ Mistral-Nemo (hard crashes?)
c773516d
`tool-call`: add LLAMA_UPDATE_GOLDENS env for test-chat-template
b35aa4ae
`tool-call`: functionary-small-v3.2 test now green
9477c546
Update README.md
c4a80501
nits
f5f74751
Update README.md
fe967b61
`tool-call`: fix qwen template test
479c1520
`agent`: add missing tool name in response!
bc52c0a4
`agent`: memorize, search_memory (sqlite-vec + sqlite-lembed), fetch …
c059aecd
`minja`: don't explode upon referencing a field on an array (fixes He…
5789f69d
Update README.md
f9b19690
ngxson
ochafik
agent: add --think "tool", default to local tools endpoint, support -…
adc673c3
Merge remote-tracking branch 'origin/master' into tool-call
1afa3128
agent: more robust squid config
30fbcb23
agent: update readme
a469f536
minja: remove tests (now in https://github.com/google/minja)
cbe395d8
Update README.md
1fd5f1af
minja: sync @ https://github.com/google/minja/commit/916c181c0d4a6f96…
5d0033f5
tool-call: add firefunction-v2 style
1f0b1579
tool-calls: migrate tests to pytest
93a5245b
Merge remote-tracking branch 'origin/master' into tool-call
055053c8
tool-calls: shorter name: grammar_triggers
1e2115ff
Merge remote-tracking branch 'origin/master' into tool-call
7bfcd0a8
tool-call: stabilize server tests
7e3feff0
Merge remote-tracking branch 'origin/master' into tool-call
e70ce3f6
Update test-tool-call.cpp
f0bd6938
Update minja.hpp https://github.com/google/minja/commit/202aa2f3de21b…
f645887e
rm trailing spaces
0e87ae24
Update fetch_server_test_models.py
0a5d5275
Fix tool-call server tests
a2fe8a49
Simplify tool call grammars when there's only 1 tool
523ebf8c
Copy minja from https://github.com/google/minja/commit/58f0ca6dd74bcb…
abd274a4
Add --jinja and --chat-template-file flags
e5113e8d
Add missing <optional> include
80138d90
Avoid print in get_hf_chat_template.py
06b51595
No designated initializers yet
ce48584f
Try and work around msvc++ non-macro max resolution quirk
389d79b6
Update test_chat_completion.py
238b9689
pepijndevos
Merge remote-tracking branch 'origin/master' into jinja
cb72cf1f
Wire LLM_KV_TOKENIZER_CHAT_TEMPLATE_N in llama_model_chat_template
78861a3e
Refactor test-chat-template
1aac99ad
Test templates w/ minja
7c84ebc2
Fix deprecation
18f257bf
Add --jinja to llama-run
8dd4f334
Merge remote-tracking branch 'origin/master' into jinja
c04c50e4
Update common_chat_format_example to use minja template wrapper
a6afb273
Test chat_template in e2e test
b4083e41
Update utils.py
b7e21710
Update test_chat_completion.py
a57bb94e
Update run.cpp
4daae0bf
ochafik Update arg.cpp
1b3bb7ee
Merge branch 'jinja' into tool-call
e7ff6ecd
Fix merge
7a7d6f6a
Update test-chat-template.cpp
e183fa9e
Merge remote-tracking branch 'origin/master' into tool-call
010726ce
Update test-chat-template.cpp
d47f40ca
pepijndevos
Merge remote-tracking branch 'origin/master' into jinja
3ed670b6
alesha-pro
Refactor common_chat_* functions to accept minja template + use_jinja…
3c7784c5
Refactor common_chat_* functions to accept minja template + use_jinja…
b75d0622
Merge remote-tracking branch 'origin/master' into jinja
40db7896
Attempt to fix linkage of LLAMA_CHATML_TEMPLATE
81c0d437
Merge branch 'jinja' into tool-call
138a4ba8
Revert LLAMA_CHATML_TEMPLATE refactor
d5fa351a
Merge branch 'jinja' into tool-call
045edd1d
Fix fetch_server_test_models.py (avoid conv trap)
2ceabee0
tools: greedy sampling in tests
259d9e45
tools: run tool call slow tests when SLOW_TESTS=1 (+ prefetch models)
acf7c240
Normalize newlines in test-chat-templates for windows tests
ee1e10e2
Forward decl minja::chat_template to avoid eager json dep
e63520f3
Flush stdout in chat template before potential crash
33322e82
Fix copy elision warning
5074e6fe
Merge branch 'jinja' into tool-call
76893f58
Rm unused optional include
fc60802b
Add missing optional include to server.cpp
0e74c9da
Merge branch 'jinja' into tool-call
d6f058da
Disable jinja test that has a cryptic windows failure
e3c475cd
minja: fix vigogne (https://github.com/google/minja/pull/22)
cc503564
Merge branch 'jinja' into tool-call
c207fdcd
ochafik
agent: add --greedy, --top-p, --top-k options
0401a83b
ochafik Apply suggestions from code review
153e8524
Finish suggested renamings
db9dd0c1
Move chat_templates inside server_context + remove mutex
c9e8fdd7
Update --chat-template-file w/ recent change to --chat-template
8c84aefd
Refactor chat template validation
154bfaaa
Merge remote-tracking branch 'origin/master' into jinja
099f9839
Guard against missing eos/bos tokens (null token otherwise throws in …
54a669e0
Warn against missing eos / bos tokens when jinja template references …
8348c605
rename: common_chat_template[s]
ee475d2f
reinstate assert on chat_templates.template_default
8a7c89e6
Merge branch 'jinja' into tool-call
9bab6939
apply renames from jinja branch
b1103747
Update minja to https://github.com/google/minja/commit/b8437df626ac6c…
8347da90
Merge branch 'jinja' into tool-call
7ea6a06c
fix std imports for gcc build
56aa93c2
Update minja to https://github.com/google/minja/pull/25
ff2cce57
Merge branch 'jinja' into tool-call
ba8dd66f
Update minja from https://github.com/google/minja/pull/27
9d8ebd62
Merge branch 'jinja' into tool-call
c6062559
Merge remote-tracking branch 'origin/master' into tool-call
fec02603
rm tests/test-minja from makefile
b49d0521
Remove examples/agent (moved to https://gist.github.com/ochafik/9246d…
f6e73dac
Delete update_jinja_goldens.py
77f4098c
ochafik
ochafik ochafik changed the title Tool call support (Llama 3.x, Functionary v3, Hermes 2 Pro, Mistral Nemo, generic) w/ lazy grammars & minimalist Jinja engine Tool call support (Llama 3.x, Functionary v3, Hermes 2 Pro, Mistral Nemo, generic) w/ lazy grammars 1 year ago
Push laziness down to grammar impl
dbf841b0
minimize diffs
ef61a4c7
common_tool_call rename
39729457
shrink diff in json conversion code
d77fecc3
Refactor string helpers into common
5268ec89
follow enum naming style for tool call styles
9e8b43f9
Factor string_join, string_split, string_repeat into common
9a5acbb4
json: refactor to surface a versatile builder
4de5cf8a
drop unused fs_list_files
03fe80f1
Merge branch 'string_utils' into tool-call
41a613bb
Update common.cpp
5140d7a0
Merge branch 'string_utils' into tool-call
e211629b
drop llama_sampler_accept_str
28cac497
more cleanups
2dd09c79
Merge remote-tracking branch 'origin/master' into tool-call
01b345be
merge common_tool_calls into common_chat_msg
82b6e9a5
smaller diff
63387c6d
nits
a4226365
Update tool-call.cpp
cce1166b
Greedy sampling in tool call tests
c6a22edc
Update test_chat_completion.py
30d33d9f
Sync minja after https://github.com/google/minja/pull/29
9ccc62b3
Merge remote-tracking branch 'origin/master' into tool-call
d186721e
fix common_chat_msg invocations
f0231a58
fix msg init warning
5e358ade
Update chat-template.hpp
cdfa8b9d
Add grammar options + rename builder to common_grammar_builder
a46de6a0
Update real tool call tests (use less models)
c2d836f9
Fix lazy trigger handling
46415d7a
WIP chat handlers
36ed106f
tool-call: allow special tokens that are grammar triggers
c479d39a
Update test_chat_completion.py
0208b207
jinja: don't add bos when jinja enabled
a6463c1e
Update test_chat_completion.py
51b7aab8
nit: trailing spaces
3f3fc039
Merge branch 'tool-call' into tool-call-handler
11594557
sync: minja
43385b2f
reshuffle chat handlers
5ec4c5e4
tool-call: fix functionary v3.1 required test
f7078cab
nits
ca0c837b
tool-call: fix special handling of special trigger tokens (Nemo)
bddc1beb
tool-call: remove nonsensical code_interpreter code
da606d8d
jinja: only add special tokens if template doesn't seem to handle them
15ec01e8
tool-call: add weather tool e2e tests
2efa0c27
tool-call: fix lazy grammar & mixed content + tool calls parsing
57f40e36
tool-call: compact json output to cap # tokens generated
67709552
Update test_chat_completion.py
09971e62
Prepare DeepSeek-R1-Distill-Llama-8B support
92ac336d
DeepSeek-R1: implement grammar constraints
118f799a
fix test-chat-handler grammar tests
add91241
Rehabilitate test_format_detection
fa065eb0
updated tool call example to be less ambiguous (deepseek likes to ran…
ad229783
Pass grammar laziness all the way down to sampler (need to print spec…
90effb84
Split e2e test_tool_call from test_chat_completion
cafea609
comment out broken tests in test_tool_call.py
b565ab2a
Update test-chat-handler.cpp
2d607f1a
Fix Llama 3.1 (incl. constrained builtin tools e.g. `<|python_tag|>fo…
ef9efc9e
Allow tool use + streaming
62717145
Cleanup dead code in llama_3_1 tool call code
6d568290
Tool-call: do last partial parse upon limit stop
2f99236f
Update test-chat-handler.cpp
0a51e514
build: Add missing optional include for gcc
d274ffcc
Disable slow tests where appropriate, + nits
62d45a55
github-actions github-actions added devops
Revert "Allow tool use + streaming"
ec4aeaf1
Simplify parser defs (incremental parsing for streaming will need mor…
b5a74d1a
Add missing link dep for windows build
ba10b47a
beef up test-chat-handler w/ delta expectations
cd63ba43
Disable test-chat-handler on win32 like the other grammar-related tests
cad1448a
minja: sync on https://github.com/google/minja/pull/33
4f257550
sync: minja
d603d067
Fix firefunction w/ jinja: requires two variables, use the chat handl…
64263910
Revert breaking minja change
4cdbb8c5
Text fireworks v2 template
47be4373
nits
18d5a1b2
ochafik ochafik changed the title Tool call support (Llama 3.x, Functionary v3, Hermes 2 Pro, Mistral Nemo, generic) w/ lazy grammars Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars 1 year ago
refactor test-chat-handler
4a1e8e9f
rm dead code + nits
923c805d
Split bulk of tool call tests to slow lane
384f54a1
Merge branch 'tool-call' of github.com:ochafik/llama.cpp into tool-call
40cc3f2f
rm unused templates, rename one
41eec462
Update test_tool_call.py
76f6ab19
tool-calls: disable crashing tests
77dd67c2
nits
0f8af536
Merge remote-tracking branch 'origin/master' into tool-call
babdefc4
Create meta-llama-Llama-3.1-8B-Instruct.jinja
682026f8
Move templates/ under models/
7b5e0803
Unify llama 3.x chat handling again (allow `{"type": "function", "nam…
ba27e985
sync: minja
6e676c80
Rename: common/chat.*, common_chat_{inputs -> params}
ed7c622d
Finish renaming of chat inputs vs. params [skip ci]
36c776f3
nits
bc8a6113
Remove server tests LLAMA_CACHE override (tests are serial, and the c…
84bc083f
Add cli mode to test-chat to generate template summaries markdown
2b245697
ochafik ochafik marked this pull request as ready for review 1 year ago
ochafik ochafik requested a review from ngxson ngxson 1 year ago
ochafik
Somehow /* bad inside block comments, ok fine.
64545ac9
Add tool call to hot topics
cbecb356
ngxson
ngxson commented on 2025-01-29
Partial revert of LLAMA_CACHE=tmp (unless set explicitly in env)
a810c37c
ngxson
Avoid passing tools twice in generic handler (now that minja passes t…
77c60e66
Unify content + message in server_task_result_cmpl_final (+ avoid str…
d86a1ae8
llama 3.1: allow `{name:` & `{function:` syntax even w/ builtin tools…
774557cf
Update tests readme + add raw output to verbose log
590c9793
split chat handler vs. parser around enum again
f8e14bff
ochafik
nits
81547e6f
debug logs are back
18450e69
rm unused llama_param
b831a6e0
llama 3.2 1b now fails the weather tool call?
7635912f
increase http timeout to 12
9591af1f
ngxson
ngxson commented on 2025-01-30
Merge remote-tracking branch 'origin/master' into tool-call
8ef37a3c
ngxson code style changes on test
2d51c459
ngxson
ngxson simplify handle_apply_template
c88f4a79
ngxson
Fix debug + verbose
3dcde9ea
Update test_chat_completion.py
06c4ca56
pepijndevos
Update test_chat_completion.py
0c171f54
ngxson
pepijndevos
Update scripts/fetch_server_test_models.py to new compact hf_repo syn…
96850432
ochafik
ggerganov
ggerganov approved these changes on 2025-01-30
ggerganov ggerganov requested a review from ngxson ngxson 1 year ago
nit: fix py import
2bb3fed3
deprecate llama_sampler_init_grammar -> llama_sampler_grammar_init
7d59bf44
ngxson
add llama_sampler_init_grammar_lazy instead of renaming the non-lazy
5a64af6c
Format test-chat.cpp
f223df02
log prompt + nits
82052466
ngxson test: leave model_hf_file blank
5add261a
ngxson
force printing </tool_call> on hermes 2 model if/as it's a special token
1029ff90
ochafik
try and avoid weird server test failure (spillage / parallelism betwe…
3bd6abeb
ochafik ochafik added enhancement
Disable chat_completion tests of non-tool jinja mode
729d2d36
Fix typo
34f54dd1
ochafik ochafik added merge ready
ochafik
ngxson
ochafik ochafik merged 8b576b6c into master 1 year ago
m18coppola
3Simplex
ochafik
3Simplex
brucepro
ochafik
ochafik
phpmac
Kreijstal
ochafik
phpmac
mudler
ggerganov
Kreijstal
benhaotang
ngxson
ochafik
ngxson
Kreijstal
ngxson
Kreijstal
ochafik
brucepro
teleprint-me
ochafik
brucepro
ngxson
ochafik
ochafik
winstondu
Dampfinchen
ochafik
ochafik
edmcman
ochafik
edmcman
ochafik
ochafik
edmcman
ochafik
edmcman
ochafik
strawberrymelonpanda
Dampfinchen
ochafik
Kreijstal
ochafik
Deathn0t
ggerganov ggerganov added hot

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone