PR #17973 XLA train step fixes

XLA train step fixes #17973

Rocketknight1 merged 21 commits into main from xla_train_step_fixes

Copy inputs to train and test step before modifying them, as this bre…

b924b407

Add XLA tests, fix our loss functions to be XLA-compatible

0414cedc

Rocketknight1 requested a review from

gante 3 years ago

Rocketknight1 requested a review from

LysandreJik 3 years ago

Rocketknight1 requested a review from

sgugger 3 years ago

make fixup

167fd324

LysandreJik requested a review from

ydshieh 3 years ago

ydshieh commented on 2022-07-01

gante commented on 2022-07-01

Update loss computation test to expect vector of per-sample losses

e01286d3

Patch loss for TFLED

3e537933

Patch loss for TFAlbert

43ce3f58

sgugger commented on 2022-07-01

Add a tf_legacy_loss config flag that enables old loss functions

4060777a

sgugger commented on 2022-07-01

Stop using config.get() because it's not a dict

391b050d

Skip loss computation test for RAG because its loss is very strange a…

8035a272

make fixup

58e3db87

sgugger approved these changes on 2022-07-01

Add XLA-compatible RAG loss

3b9fe743

Fix dtype of loss mask for TFAlbert

db79798f

Fix test for XLNet too because it overrides the default one

9a6b7b58

make fixup

92a4e798

Fix config test

6021439b

No more depending on GPU NaN behaviour

a46da255

Add test, avoid potential zero division

64c0e77e

Fix test item assignment

d34a3b2f

Fix loss computation masking test

32078b24

make fixup

a19ee4fd

Fix dtype bugs

f17136c8

Rocketknight1 merged d6cec458 into main 3 years ago

Rocketknight1 deleted the xla_train_step_fixes branch 3 years ago

patrickvonplaten commented on 2022-07-04

Reviewers

sgugger

ydshieh

patrickvonplaten

gante

LysandreJik

Assignees

No one assigned

Labels

None yet

Milestone

No milestone

transformers XLA train step fixes #17973 Merged

XLA train step fixes #17973

transformers
XLA train step fixes
#17973

Merged