transformers
Fix how we compute the final non-padding token for ForSequenceClassification models
#35911

Merged

Commits

Fix how we compute the final non-padding token for Gemma (and probably other models)

Rocketknight1 committed 1 year ago
.size() -> .shape[]

Rocketknight1 committed 1 year ago
Propagating changes to other models

Rocketknight1 committed 1 year ago
Propagating changes to other models

Rocketknight1 committed 1 year ago
Change it for all ForSequenceClassification models

Rocketknight1 committed 1 year ago
Fix batch dim

Rocketknight1 committed 1 year ago
More TF fixes

Rocketknight1 committed 1 year ago
Copy the TF fix around as well

Rocketknight1 committed 1 year ago
Correct layer name for TFCTRL

Rocketknight1 committed 1 year ago
Cleaner .to()

Rocketknight1 committed 1 year ago
Clean up the nested if-else

Rocketknight1 committed 1 year ago
Use argmax() instead of .max().values

Rocketknight1 committed 1 year ago