llm-foundry
Convert to DataSpec and add token counts that include padding
#676
Merged

Convert to DataSpec and add token counts that include padding #676

dakinggg merged 15 commits into mosaicml:main from dakinggg:token-counting
dakinggg
make everything DataSpec with token counting function and tests
b3fe63b8
fix the places that assumed iterable
f88d2676
precommit and remove erroneous token counting from denoising text dat…
eac52cb0
pyright
81fff0f6
docstring
0159ec34
add support for encoder decoder batches and test denoising too
f7c03536
more complete test
947d722a
dakinggg dakinggg marked this pull request as ready for review 2 years ago
dakinggg dakinggg requested a review from alextrott16 alextrott16 2 years ago
dakinggg dakinggg requested a review from mvpatel2000 mvpatel2000 2 years ago
mvpatel2000
mvpatel2000 commented on 2023-10-16
dakinggg Update llmfoundry/data/text_data.py
0edab5d5
dakinggg Update llmfoundry/data/text_data.py
e3dcfb51
precommit
6e8628a8
fix
1935674b
precommit
c6f381bf
more pyright
2334bae7
fix again
463f086c
alextrott16
alextrott16 approved these changes on 2023-10-16
PR comments
570c1e6a
dakinggg dakinggg merged 4fa2dd88 into main 2 years ago
dakinggg dakinggg deleted the token-counting branch 2 years ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone