[PrefixLM] Figuring out why prefix lm is doing poorly on short context #169
Loss normalisation should be invariant to number of tokens trained on…
14653b7e
Make cross entropy use a microbatch independent normalisation factor …
79fbda19
Loss mask is not a boolean tensor
0d4308b7
thomasw21
force pushed
from
e6e9de31
to
0d4308b7
4 years ago
Make it mergeable, ie does not change the behaviour of gpt
6d8608e3
thomasw21
marked this pull request as ready for review 4 years ago
thomasw21
changed the title [WIP] Figuring out why prefix lm is doing poorly on short context [PrefixLM] Figuring out why prefix lm is doing poorly on short context 4 years ago
stas00
approved these changes
on 2021-11-05
Allow to run loss_on_targets_only=False for prefix lm (#179)
4c4adf91
thomasw21
merged
6d146b5f
into main 4 years ago
thomasw21
deleted the thomas/fix_aggregated_loss branch 3 years ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub