Megatron-DeepSpeed
[PrefixLM] Figuring out why prefix lm is doing poorly on short context
#169

Merged

[PrefixLM] Figuring out why prefix lm is doing poorly on short context #169

thomasw21 merged 5 commits into main from thomas/fix_aggregated_loss

thomasw21 requested a review from

ibeltagy 4 years ago

thomasw21 requested a review from

TevenLeScao 4 years ago

thomasw21 commented on 2021-10-29

Loss normalisation should be invariant to number of tokens trained on…

14653b7e

Make cross entropy use a microbatch independent normalisation factor …

79fbda19

Loss mask is not a boolean tensor

0d4308b7

thomasw21 force pushed from e6e9de31 to 0d4308b7 4 years ago

Make it mergeable, ie does not change the behaviour of gpt

6d8608e3

thomasw21 marked this pull request as ready for review 4 years ago

thomasw21 requested a review from

stas00 4 years ago

thomasw21 changed the title ~~[WIP] Figuring out why prefix lm is doing poorly on short context~~ [PrefixLM] Figuring out why prefix lm is doing poorly on short context 4 years ago

stas00 approved these changes on 2021-11-05

Allow to run loss_on_targets_only=False for prefix lm (#179)

4c4adf91

thomasw21 merged 6d146b5f into main 4 years ago

thomasw21 deleted the thomas/fix_aggregated_loss branch 3 years ago

Reviewers

stas00

ibeltagy

TevenLeScao

Assignees

No one assigned

Labels

None yet

Milestone

No milestone

Megatron-DeepSpeed [PrefixLM] Figuring out why prefix lm is doing poorly on short context #169 Merged

[PrefixLM] Figuring out why prefix lm is doing poorly on short context #169

Megatron-DeepSpeed
[PrefixLM] Figuring out why prefix lm is doing poorly on short context
#169

Merged