Megatron-DeepSpeed
[PrefixLM] Figuring out why prefix lm is doing poorly on short context
#169
Merged

[PrefixLM] Figuring out why prefix lm is doing poorly on short context #169

thomasw21 merged 5 commits into main from thomas/fix_aggregated_loss
thomasw21
thomasw21 thomasw21 requested a review from ibeltagy ibeltagy 4 years ago
thomasw21 thomasw21 requested a review from TevenLeScao TevenLeScao 4 years ago
thomasw21
thomasw21 commented on 2021-10-29
thomasw21 Loss normalisation should be invariant to number of tokens trained on…
14653b7e
thomasw21 Make cross entropy use a microbatch independent normalisation factor …
79fbda19
thomasw21 Loss mask is not a boolean tensor
0d4308b7
thomasw21 thomasw21 force pushed from e6e9de31 to 0d4308b7 4 years ago
thomasw21 Make it mergeable, ie does not change the behaviour of gpt
6d8608e3
thomasw21 thomasw21 marked this pull request as ready for review 4 years ago
thomasw21
thomasw21 thomasw21 requested a review from stas00 stas00 4 years ago
thomasw21 thomasw21 changed the title [WIP] Figuring out why prefix lm is doing poorly on short context [PrefixLM] Figuring out why prefix lm is doing poorly on short context 4 years ago
stas00
thomasw21
stas00
stas00
stas00
stas00 approved these changes on 2021-11-05
stas00
thomasw21
stas00
thomasw21 Allow to run loss_on_targets_only=False for prefix lm (#179)
4c4adf91
thomasw21 thomasw21 merged 6d146b5f into main 4 years ago
thomasw21 thomasw21 deleted the thomas/fix_aggregated_loss branch 3 years ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone