Fix SigLIP casual mask bug (#25360)
### Description
<!-- Describe your changes. -->
SigLIP architecture inside the vision encoder should not use a causal
mask on the attention. This change will fix Phi 4 MM accuracy issues we
have seen.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>