Chat response parsing (#40894)
* Initial commit
* Adding more tests, bugfixes, starting tool tests
* Add support for JSON parsers and some tool tests
* stash commit
* stash commit
* stash commit
* stash commit
* stash commit
* Fix cohere schema, fix a lot of the recursive parser code
* GPT-OSS passing too!
* Update tests
* make fixup
* Offset tracking partially done
* stash commit
* stash commit
* Assistant masking Just Works
* make fixup
* stash commit
* stash commit
* JMESPath approach
* stash commit before i rip this PR apart
* Remove broken offset code
* Remove broken offset code
* Update chat parsing code and add tests for Ernie + fix Cohere tests for new format
* Implement tokenizer method
* jmespath dependency handling
* Completed TODOs
* Add support to TextGenerationPipeline
* Update GPT-OSS schema and test cases
* make fixup
* Fix typing (??)
* missing future import
* Use old typing in tokenization_utils_base.py
* put jmespath in various extras
* Remove accidental newline
* Guard tests correctly
* Remove require_jinja on the schema tests since we don't actually apply chat templates there
* make fixup
* fix some bad linter changes
* Fix docstring
* Push draft documentation
* Extend tests, more documentation
* make fixup
* docs docs docs
* Add Processor support
* Add to toctree
* Flag markdown correctly
* Remove double backslashes in docs for simplicity
* Simplify node-regex-to-dict
* Add support to ImageTextToTextPipeline
* Add support to ImageTextToTextPipeline and save/loading support in Processors
* Begin reworking docs to start fitting in response parsing
* Fix rebase
* Expand documentation further
* Expand documentation further
* Refactor x-regex-to-dict to x-regex-key-value, update the parser logic docs section
* Refactor x-regex-to-dict to x-regex-key-value, update the parser logic docs section
* More docs update
* Update TextGenerationPipeline to support tools properly
* Some rebase fixes
* Re-add is_jmespath_available
* Re-add is_jmespath_available
* Add Qwen3 parser and test, add maybe-json support
* Rollback processor changes - we'll wait for legacy saving to be deprecated
* Make fixup
* Revert ImageTextToText changes for now
* Add pipeline test
* make fixup
* Resolve a todo
* Resolve more TODOs and clean up the spec a little
* Add ref in the tools doc
* Update docs/source/en/chat_response_parsing.md
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
* Update src/transformers/utils/chat_parsing_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Add a docstring for parse_response
* Add function docstring and reference it in the docs
* Fix generate link
* Revert Processor changes for now
* Use updated GPT-OSS format
* Print the dict keys instead of the whole dict so the example doesn't become too big
---------
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>