Prefix lm (#52)
* ICT zeroshot evaluation code
* made more generic, aligned with other tasks
* Fixed based on review recoemmendation
* fixed another issue
* implementing DPR
* implementation dpr
* adding dpr code
* removed commnets
* removed commnets
* removed commnets
* DPR evaluation debugging
* DPR ongoing
* DPR finetune and evaluation
* fixing model evaluation of retriver
* added pre ad post process
* added pre ad post process
* evaluation works!
* debugging DPR
* fix copy-n-paste error
remove erroneous arg.
* Typo fix in readme
* t5 fixes
* before cleaning the comments
* vit pipeline fixes
* cleaning the code
* additional cleaning
* renaming the folders
* Add temporary assert to finetuning until it can be fixed.
* Fixed issues with ICT pretraining
* updated the evaluation script for retriver
* updated the evaluation script for retriver
* updated the evaluation script for retriver
* updated the evaluation script for retriver
* added exit interval for finetuning
* updating the scripts
* updating no load rng
* updating script
* Update T5 scripts
* resolved hang issue
* fixed the tensor size miss-mass issue
* fixed the evaluation hangs
* Adding readme
* Adding readme
* Adding readme
* Adding readme
* Adding readme
* Adding readme
* Adding readme
* Adding readme
* Clean up README.md a bit
* addressed comments
* updated readme
* updated readme
* updated readme
* updated readme
* Basic handling of prefix lm by updating the mask
* Add prefix option to gpt temporarily and prevent it to use custom kernel
* Add argument for prefix lm, in order to configure masking strategy
* Woops
* loss_on_targets_only flag, assert that current prefix implementation only works with reset_attention_mask set to True and attempt to fix empty slice issue
* Format
* Reverse renaming
* Allow prefix on partial document at the end
* WIP: add prefix per row feature
* Document the use of None
* Woops
* Handle empty document better
* We might not be able to concat empty tensors
* Handle empty tensor seperately
* Debug
* Test
* Add loss masking as script argument
* Turns out deepspeed integration of attention matrices prevented dynamic masks
* Add more asserts
* Prefix can only see the prefix, it cannot see target
* Remove prefix-lm argument as we split the pretrain script
* Iz PR review
* Make masking row dependent when using prefix
* Revert "Merge remote-tracking branch 'origin/master' into prefix_lm"
This reverts commit d49d6e5e1d1074bb6abee914e4b31ae50ece9a4e, reversing
changes made to 28a712d42e071c50686a9a2eb4f1816dcdb0ef82.
* Tests (#1)
* WIP: test
* Still trying to figure out deepspeed
* WIP
* Test test
* Test how to setup deepspeed in unit tests
* Test something else
* Empty strings might be problematic
* Remove unecessary arguments
* Woops
* Remove global variables at the end of each test and init deepspeed
* Woops
* Maybe adding classmethod
* Woops
* Add debug print to check that tear down happends
* Reset global variables before
* Let's test this
* Try something else
* WIP
* More fix
* More fix
* More stuff to fix
* We really want to compare vectors and not coordinates
* Reformat
* check something out
* fix test
* Remove prefix-lm flag as it's integrated
* Woops
* Add test for without reset attention mask
* Fix test for non reset attention mask
* Fix test
* Update code for prefix lm
Co-authored-by: Mostofa Patwary <mostofa.patwary@gmail.com>
Co-authored-by: Mostofa Patwary <mpatwary@nvidia.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Devrim <46989091+devrimcavusoglu@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas@stason.org>
Co-authored-by: Vijay Korthikanti <vkorthikanti@nvidia.com>
Co-authored-by: Jared Casper <jcasper@nvidia.com>
Co-authored-by: Mohammad Shoeybi <mshoeybi@nvidia.com>
Co-authored-by: Deepak Narayanan <dnarayanan@nvidia.com>