llm-foundry
Convert to DataSpec and add token counts that include padding
#676
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
15
Changes
View On
GitHub
Convert to DataSpec and add token counts that include padding
#676
dakinggg
merged 15 commits into
mosaicml:main
from
dakinggg:token-counting
make everything DataSpec with token counting function and tests
b3fe63b8
fix the places that assumed iterable
f88d2676
precommit and remove erroneous token counting from denoising text dat…
eac52cb0
pyright
81fff0f6
docstring
0159ec34
add support for encoder decoder batches and test denoising too
f7c03536
more complete test
947d722a
dakinggg
marked this pull request as ready for review
2 years ago
dakinggg
requested a review
from
alextrott16
2 years ago
dakinggg
requested a review
from
mvpatel2000
2 years ago
mvpatel2000
commented on 2023-10-16
Update llmfoundry/data/text_data.py
0edab5d5
Update llmfoundry/data/text_data.py
e3dcfb51
precommit
6e8628a8
fix
1935674b
precommit
c6f381bf
more pyright
2334bae7
fix again
463f086c
alextrott16
approved these changes on 2023-10-16
PR comments
570c1e6a
dakinggg
merged
4fa2dd88
into main
2 years ago
dakinggg
deleted the token-counting branch
2 years ago
Login to write a write a comment.
Login via GitHub
Reviewers
alextrott16
mvpatel2000
Assignees
No one assigned
Labels
None yet
Milestone
No milestone
Login to write a write a comment.
Login via GitHub