[`Flash Attention 2`] Add flash attention 2 for GPT-Neo-X (#26463)
* add flash-attn-2 support for GPT-neo-x
* fixup
* add comment
* revert
* fixes
* update docs
* comment
* again
* fix copies
* add plot + fix copies
* Update docs/source/en/model_doc/gpt_neox.md