structuredconfig for train.py and eval.py (#1051)
* first commit for structuredconfig for train.py
* revamp configs
* wip latest issue
* reorder so mandatory attributes come first
* fix
* fix
* fix fix
* fix types
* fix dictconfig
* fix union of list|dict configs
* fix type annotation
* oops
* fixed configs
* add save ignore keys
* fix batch size kerfuffle
* fix dictconfig stuff
* fix dictconfig stuff again
* fix
* fix
* updated unit tests for variables
* last fix?
* if this test case does not pass I will venmo Mihir 0
* remove a 'not' -- eg. 'I am not going crazy'
* Update scripts/train/train.py
Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com>
* set amp bf16 as default precision, etc
* temporarily wrap with dictconfig before ** migration
* fix icl tasks
* fix
* fix activation checkpointing reentrant
* fix extraneous keys
* first round **
* fix?
* quick fsdp config fix
* updated yamls to make variables explicit
* remove precision from mandatory params list
* I expect many of these to fail in interesting ways
* fix test_model test cases with **
* fix many more test cases
* fix dictconfig objectification
* fix remaining test cases
* remove unneeded **
* fix test case
* changed back argument name
* fix
* ** for finetuning dataloader
* fix?
* fix dataloader
* fix
* fix finetuning dataloader
* fix build_text_dataloader
* left to my own devices
* fix packing
* fix typo
* fix padding test cases
* ignore extra parameters and warn
* fix style
* fix quality checks
* fix code quality
* pyright-fu
* fix
* just one more type constraint bro
* OmegaConf -> om
* rename variables for clarity
* revert file
* revert file II
* revert file III: revert of the sith
* peft revert file
* revert v_mpt
* last revert
* remove redundant checks
* deprecate
* make cleaner
* pyright is bullying me again
* further clean config_utils
* polish train
* polish train and eval
* fix dist
* fix style
* organize eval and train
* fix
* used helper function to make main cleaner
* fix stuff
* fix pyright
* added fix and explanation
* fix typo in unit test update smh
* Update llmfoundry/registry.py
Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com>
* Update scripts/train/train.py
Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com>
* Update scripts/train/train.py
Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com>
* Update scripts/train/train.py
Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com>
* see if this fails
* reject name and device rather than ignoring
* pretrained is not a bool
* add validation to make sure the user doesn't set both
* forbid config keys
* oops forgot eval
* address coomments
* removed redundant check
* updated callsites not to use name
* fix
* validate extraneous keys in dataloader
* fix
* fix more
* fix III: revenge of the fix
* fix IV: a new hope
* fix V: the empire fixes back
* fixed some more types
* fix VI: return of the fix
* fix VII: the fix awakens
* fix VIII: the last bug
* fix
* final fix I think
* fixed
* fix style
* fix
* fix fix
* fix fix style
* icl task config
* fix train
* fix finetuning dataloader
* fix train types
* fix token counting
* fix train types
* oopsie
* fix straggler issues
* fix tests
* fix???
* fix hf v mpt gpu test and fmapi test
* pop device
* to_str_dict -> to_dict_recursive
* fix this darn unit test one more time
* fix ComposerMPTCausalLM constructor invocation
* Delete tests/models/hf/test_hf_fsdp.py
* unwrap model in unit tests
* model.model.model.model.model
* abstract away dataclass construction
* updated docstrings and removed dictconfig from logging logic
* flag icl tasks required or not
* updated a couple yamls
* updated train and eval scripts
* un-delete global train batch size
* fix
* I don't understand why this doesn't work
* that was the sneakiest bug I've ever fixed
* try to fix the regression test
* remove device train grad accum
* fix validate config
* removed unused import
* use variables
* missing mandatory value fix
* use correct type of error
* fix
* import TrainConfig just in case?
* moved trainconfig and evalconfig into utils
* works
* no cheating
* dicts everywhere gah
* try no recursive just
* rename typed helpers
* fix the test cases with deep magic
* towards a peaceful resolution
* remove comments
* fix type warnings
* Update llmfoundry/utils/config_utils.py
Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com>
* address low-hanging fruit
* remove peft wrapping extra model
* python :handshake: haskell
* dataset config should be dict
* just because omega starts with OMMMM does not mean it's zen
* fix
* fix
* structured settlement
* precision further down
* throws TypeError instead of MissingMandatoryValue or whatever
* remove debugging statement
* remove to_container calls everywhere
* wrap then unwrap
* pyright
* error early on missing mandatory values
* remove unnecessory ignore
* update unit tests
* update eval yamls
* Update train.py
* make log level optional again
* oopsie
* use keywords for arg clarity
* use keywords for arg clarity
* style
* style
* dist timeout
* resolve deeper conflict issues
* fix train.py
* fix registry
* fix dataloader
* fix train II
* fix dataloader and utils
* fix dictconfig
* skill issue
* add new keys
* remove pop_config
* fix
---------
Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com>