Chat response parsing (#40894)

Commit

114 days ago

Chat response parsing (#40894) * Initial commit * Adding more tests, bugfixes, starting tool tests * Add support for JSON parsers and some tool tests * stash commit * stash commit * stash commit * stash commit * stash commit * Fix cohere schema, fix a lot of the recursive parser code * GPT-OSS passing too! * Update tests * make fixup * Offset tracking partially done * stash commit * stash commit * Assistant masking Just Works * make fixup * stash commit * stash commit * JMESPath approach * stash commit before i rip this PR apart * Remove broken offset code * Remove broken offset code * Update chat parsing code and add tests for Ernie + fix Cohere tests for new format * Implement tokenizer method * jmespath dependency handling * Completed TODOs * Add support to TextGenerationPipeline * Update GPT-OSS schema and test cases * make fixup * Fix typing (??) * missing future import * Use old typing in tokenization_utils_base.py * put jmespath in various extras * Remove accidental newline * Guard tests correctly * Remove require_jinja on the schema tests since we don't actually apply chat templates there * make fixup * fix some bad linter changes * Fix docstring * Push draft documentation * Extend tests, more documentation * make fixup * docs docs docs * Add Processor support * Add to toctree * Flag markdown correctly * Remove double backslashes in docs for simplicity * Simplify node-regex-to-dict * Add support to ImageTextToTextPipeline * Add support to ImageTextToTextPipeline and save/loading support in Processors * Begin reworking docs to start fitting in response parsing * Fix rebase * Expand documentation further * Expand documentation further * Refactor x-regex-to-dict to x-regex-key-value, update the parser logic docs section * Refactor x-regex-to-dict to x-regex-key-value, update the parser logic docs section * More docs update * Update TextGenerationPipeline to support tools properly * Some rebase fixes * Re-add is_jmespath_available * Re-add is_jmespath_available * Add Qwen3 parser and test, add maybe-json support * Rollback processor changes - we'll wait for legacy saving to be deprecated * Make fixup * Revert ImageTextToText changes for now * Add pipeline test * make fixup * Resolve a todo * Resolve more TODOs and clean up the spec a little * Add ref in the tools doc * Update docs/source/en/chat_response_parsing.md Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update src/transformers/utils/chat_parsing_utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Add a docstring for parse_response * Add function docstring and reference it in the docs * Fix generate link * Revert Processor changes for now * Use updated GPT-OSS format * Print the dict keys instead of the whole dict so the example doesn't become too big --------- Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

References

#40894 - Chat response parsing

Author

Rocketknight1

Parents

3f2db2c2

transformers 264cce9e - Chat response parsing (#40894)

transformers
264cce9e - Chat response parsing (#40894)