Add unit tests for Phi vision (#23357)
### Description
This PR adds unit tests for [fusing the vision
components](https://github.com/microsoft/onnxruntime/pull/20721) of
Phi-3 vision and Phi-3.5 vision.
### Motivation and Context
Many multi-modal models use a CLIP encoder or a variant of CLIP as part
of their encoders. These fusion unit tests will ensure that the vision
components of Phi-3 vision and Phi-3.5 vision can still be fused when
existing fusions are modified to support more models.