llm-foundry
cc8351c4 - structuredconfig for train.py and eval.py (#1051)

Commit

1 year ago

structuredconfig for train.py and eval.py (#1051) * first commit for structuredconfig for train.py * revamp configs * wip latest issue * reorder so mandatory attributes come first * fix * fix * fix fix * fix types * fix dictconfig * fix union of list|dict configs * fix type annotation * oops * fixed configs * add save ignore keys * fix batch size kerfuffle * fix dictconfig stuff * fix dictconfig stuff again * fix * fix * updated unit tests for variables * last fix? * if this test case does not pass I will venmo Mihir 0 * remove a 'not' -- eg. 'I am not going crazy' * Update scripts/train/train.py Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com> * set amp bf16 as default precision, etc * temporarily wrap with dictconfig before ** migration * fix icl tasks * fix * fix activation checkpointing reentrant * fix extraneous keys * first round ** * fix? * quick fsdp config fix * updated yamls to make variables explicit * remove precision from mandatory params list * I expect many of these to fail in interesting ways * fix test_model test cases with ** * fix many more test cases * fix dictconfig objectification * fix remaining test cases * remove unneeded ** * fix test case * changed back argument name * fix * ** for finetuning dataloader * fix? * fix dataloader * fix * fix finetuning dataloader * fix build_text_dataloader * left to my own devices * fix packing * fix typo * fix padding test cases * ignore extra parameters and warn * fix style * fix quality checks * fix code quality * pyright-fu * fix * just one more type constraint bro * OmegaConf -> om * rename variables for clarity * revert file * revert file II * revert file III: revert of the sith * peft revert file * revert v_mpt * last revert * remove redundant checks * deprecate * make cleaner * pyright is bullying me again * further clean config_utils * polish train * polish train and eval * fix dist * fix style * organize eval and train * fix * used helper function to make main cleaner * fix stuff * fix pyright * added fix and explanation * fix typo in unit test update smh * Update llmfoundry/registry.py Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com> * Update scripts/train/train.py Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com> * Update scripts/train/train.py Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com> * Update scripts/train/train.py Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com> * see if this fails * reject name and device rather than ignoring * pretrained is not a bool * add validation to make sure the user doesn't set both * forbid config keys * oops forgot eval * address coomments * removed redundant check * updated callsites not to use name * fix * validate extraneous keys in dataloader * fix * fix more * fix III: revenge of the fix * fix IV: a new hope * fix V: the empire fixes back * fixed some more types * fix VI: return of the fix * fix VII: the fix awakens * fix VIII: the last bug * fix * final fix I think * fixed * fix style * fix * fix fix * fix fix style * icl task config * fix train * fix finetuning dataloader * fix train types * fix token counting * fix train types * oopsie * fix straggler issues * fix tests * fix??? * fix hf v mpt gpu test and fmapi test * pop device * to_str_dict -> to_dict_recursive * fix this darn unit test one more time * fix ComposerMPTCausalLM constructor invocation * Delete tests/models/hf/test_hf_fsdp.py * unwrap model in unit tests * model.model.model.model.model * abstract away dataclass construction * updated docstrings and removed dictconfig from logging logic * flag icl tasks required or not * updated a couple yamls * updated train and eval scripts * un-delete global train batch size * fix * I don't understand why this doesn't work * that was the sneakiest bug I've ever fixed * try to fix the regression test * remove device train grad accum * fix validate config * removed unused import * use variables * missing mandatory value fix * use correct type of error * fix * import TrainConfig just in case? * moved trainconfig and evalconfig into utils * works * no cheating * dicts everywhere gah * try no recursive just * rename typed helpers * fix the test cases with deep magic * towards a peaceful resolution * remove comments * fix type warnings * Update llmfoundry/utils/config_utils.py Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com> * address low-hanging fruit * remove peft wrapping extra model * python :handshake: haskell * dataset config should be dict * just because omega starts with OMMMM does not mean it's zen * fix * fix * structured settlement * precision further down * throws TypeError instead of MissingMandatoryValue or whatever * remove debugging statement * remove to_container calls everywhere * wrap then unwrap * pyright * error early on missing mandatory values * remove unnecessory ignore * update unit tests * update eval yamls * Update train.py * make log level optional again * oopsie * use keywords for arg clarity * use keywords for arg clarity * style * style * dist timeout * resolve deeper conflict issues * fix train.py * fix registry * fix dataloader * fix train II * fix dataloader and utils * fix dictconfig * skill issue * add new keys * remove pop_config * fix --------- Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com>

References

#1051 - structuredconfig for train.py and eval.py

Author

milocress

Parents

a7770148

llm-foundry cc8351c4 - structuredconfig for train.py and eval.py (#1051)

llm-foundry
cc8351c4 - structuredconfig for train.py and eval.py (#1051)