Improve BERT-like models performance with better self attention (#9124)
* Improve BERT-like models attention layers
* Apply style
* Put back error raising instead of assert
* Update template
* Fix copies
* Apply raising valueerror in MPNet
* Restore the copy check for the Intermediate layer in Longformer
* Update longformer