BetterTransformer support training & autocast for all archs (#1225)
* support training
* encoders and encoder+decoder all work
* warning about training decoders with padding
* leave to an other PR the backward for some archs
* nit
* fix tests
* hopefully tests pass
* fix