[ESM] support attention API (#40370)
* ESM supports attention API
* supports flags
* fix tests
* fix copiees
* another fixup needed after fixing tests
* fix tests and make sure Evolla copied everything
* fix
* order
* forgot about "is_causal" for fa2
* cross attention can't be causal