Prefix lm (#52) - SemanticDiff

Commit

4 years ago

Prefix lm (#52) * ICT zeroshot evaluation code * made more generic, aligned with other tasks * Fixed based on review recoemmendation * fixed another issue * implementing DPR * implementation dpr * adding dpr code * removed commnets * removed commnets * removed commnets * DPR evaluation debugging * DPR ongoing * DPR finetune and evaluation * fixing model evaluation of retriver * added pre ad post process * added pre ad post process * evaluation works! * debugging DPR * fix copy-n-paste error remove erroneous arg. * Typo fix in readme * t5 fixes * before cleaning the comments * vit pipeline fixes * cleaning the code * additional cleaning * renaming the folders * Add temporary assert to finetuning until it can be fixed. * Fixed issues with ICT pretraining * updated the evaluation script for retriver * updated the evaluation script for retriver * updated the evaluation script for retriver * updated the evaluation script for retriver * added exit interval for finetuning * updating the scripts * updating no load rng * updating script * Update T5 scripts * resolved hang issue * fixed the tensor size miss-mass issue * fixed the evaluation hangs * Adding readme * Adding readme * Adding readme * Adding readme * Adding readme * Adding readme * Adding readme * Adding readme * Clean up README.md a bit * addressed comments * updated readme * updated readme * updated readme * updated readme * Basic handling of prefix lm by updating the mask * Add prefix option to gpt temporarily and prevent it to use custom kernel * Add argument for prefix lm, in order to configure masking strategy * Woops * loss_on_targets_only flag, assert that current prefix implementation only works with reset_attention_mask set to True and attempt to fix empty slice issue * Format * Reverse renaming * Allow prefix on partial document at the end * WIP: add prefix per row feature * Document the use of None * Woops * Handle empty document better * We might not be able to concat empty tensors * Handle empty tensor seperately * Debug * Test * Add loss masking as script argument * Turns out deepspeed integration of attention matrices prevented dynamic masks * Add more asserts * Prefix can only see the prefix, it cannot see target * Remove prefix-lm argument as we split the pretrain script * Iz PR review * Make masking row dependent when using prefix * Revert "Merge remote-tracking branch 'origin/master' into prefix_lm" This reverts commit d49d6e5e1d1074bb6abee914e4b31ae50ece9a4e, reversing changes made to 28a712d42e071c50686a9a2eb4f1816dcdb0ef82. * Tests (#1) * WIP: test * Still trying to figure out deepspeed * WIP * Test test * Test how to setup deepspeed in unit tests * Test something else * Empty strings might be problematic * Remove unecessary arguments * Woops * Remove global variables at the end of each test and init deepspeed * Woops * Maybe adding classmethod * Woops * Add debug print to check that tear down happends * Reset global variables before * Let's test this * Try something else * WIP * More fix * More fix * More stuff to fix * We really want to compare vectors and not coordinates * Reformat * check something out * fix test * Remove prefix-lm flag as it's integrated * Woops * Add test for without reset attention mask * Fix test for non reset attention mask * Fix test * Update code for prefix lm Co-authored-by: Mostofa Patwary <mostofa.patwary@gmail.com> Co-authored-by: Mostofa Patwary <mpatwary@nvidia.com> Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: Devrim <46989091+devrimcavusoglu@users.noreply.github.com> Co-authored-by: Stas Bekman <stas@stason.org> Co-authored-by: Vijay Korthikanti <vkorthikanti@nvidia.com> Co-authored-by: Jared Casper <jcasper@nvidia.com> Co-authored-by: Mohammad Shoeybi <mshoeybi@nvidia.com> Co-authored-by: Deepak Narayanan <dnarayanan@nvidia.com>

References

#52 - Prefix lm

Author

thomasw21

Parents

648ee17f

Megatron-DeepSpeed 68b46f20 - Prefix lm (#52)

Megatron-DeepSpeed
68b46f20 - Prefix lm (#52)