Add flash attention v3 (#2381)
Summary:
First, install flash attention hopper:
```
python install.py --userbenchmark triton --flash
```
Second, run the following command:
```
$ python run_benchmark.py triton --op flash_attention --only flash_v2,flash_v3 --num-inputs 1 --metrics tflops
[00:00<00:00, 5.70it/s]
SeqLen flash_v2-tflops flash_v3-tflops
-------- ----------------- -----------------
128 49.2482 33.825
```
Pull Request resolved: https://github.com/pytorch/benchmark/pull/2381
Reviewed By: manman-ren
Differential Revision: D59871536
Pulled By: xuzhao9
fbshipit-source-id: 23bf32d18bda5004bf614504e40d2c33ad8966d3