[CUDA] Update sm check for flash attention (#24584)
### Description
Currently, flash attention is only enabled for sm8x and sm90. That means
blackwell GPU will not use flash attention. This change is enable flash
attention for sm > 90.
Note that the flash attention implementation is not optimized for
blackwell, but shall be able to run in blackwell GPU.
Future works:
* Integrate flash attn for hopper:
https://github.com/Dao-AILab/flash-attention/tree/main/hopper
* Integrate fmha for blackwell:
https://github.com/NVIDIA/cutlass/tree/main/examples/77_blackwell_fmha
* Update cudnn and cudnn frontend to latest version (so that we can use
the cudnn flash attention for blackwell).
### Motivation and Context
ORT GENAI is slow in RTX 5090