Adds support for chat formatted finetuning input data. (#884)
* fix conflicting formatting linting guidelines
* used older union operator for legacy support
* did the same thing in another place
* isort ignore specific lines
* fixes
* isort do not skip line
* address comments
* renamed some more things
* split tests and add some verification for tokenization split
* fix formatting
* added docstrings
* added end-to-end-test with HF dataset
* fix code style
* renamed file and fixed tests
* use chat template diff
* addressed comment
* Update llmfoundry/data/finetuning/tasks.py
Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com>
* Update llmfoundry/data/finetuning/tasks.py
Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com>
* fixed type of TokenizedExample
* use cast
* use _ALLOWED_{PROMPT, RESPONSE}_KEYS
* updated tests
* fix
* fix?
* Update llmfoundry/data/finetuning/tasks.py
Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com>
* Update llmfoundry/data/finetuning/tasks.py
Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com>
---------
Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com>