[refactor] set attention implementation (#38974)
* update
* fix some tests
* init from config, changes it in-place, add deepcopy in tests
* fix modernbert
* don't delete thsi config attr
* update
* style and copies
* skip tests in generation
* fix style
* accidentally removed flash-attn-3, revert
* docs
* forgot about flags set to False
* fix copies
* address a few comments
* fix copies
* custom code BC