Fix Attention Runtime Error for CLIP model (#17729)
### Description
The condition check is not correct
```
if (is_unidirectional_ && enable_fused_causal_attention_) { // GPT
}
else { // BERT
}
```
Change it to
```
if (is_unidirectional_) { // GPT
}
else { // BERT
}
```
Another walkaround is to enable fused causal attention by adding an
environment variable `ORT_ENABLE_FUSED_CAUSAL_ATTENTION=1` before
running stable diffusion.
### Motivation and Context
Without the fix, optimized CLIP model of stable diffusion will encounter
error in running Attention node:
2023-09-24 16:15:31.206037898 [E:onnxruntime:,
sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned
while running Attention node. Name:'Attention_0' Status Message:
/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/tensorrt_fused_multihead_attention/mha_runner.cu:207
bool
onnxruntime::contrib::cuda::FusedMHARunnerFP16v2::mhaImpl::is_flash_attention(int)
const interface->mHasCausalMask == false was false.
Note that the bug has been there for a long time. It is just surfaced
since we recently added a fusion for CLIP, which will trigger the error.
We will add a comprehensive test for causal attention later to avoid
such corner cases.