[QNN EP] Optimize Session Creation Time (#26628)
### Description
<!-- Describe your changes. -->
qnn::utils::TwoDimensionTranspose makes the bottleneck during session
creation, because there is a double for loop memcpy. If the weight is
quite large, it will be very slow. And it's called a total of 3 times by
ReshapeGemmFusion.
```
QnnModel::ComposeGraph → ReshapeGemmFusion::AddToModelBuilder → CreateOrValidateOnQnn → qnn::utils::TwoDimensionTranspose
QNNExecutionProvider::GetCapability → QNNExecutionProvider::GetSupportedNodes → ReshapeGemmFusion::IsSupported → CreateOrValidateOnQnn → qnn::utils::TwoDimensionTranspose (do QNN OP validation)
QNNExecutionProvider::GetCapability → QNNExecutionProvider::GetSupportedNodes → onnxruntime::qnn::ReshapeGemmFusion::IsSupported → CreateOrValidateOnQnn → qnn::utils::TwoDimensionTranspose (do QNN OP validation)
```
This change avoid heavy memcpy by using a dummy tensor when only shape
validation is required.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Below tested on a tensor with shape [2304, 700]
| Function | TwoDimensionTranspose_1 | TwoDimensionTranspose_2 |
TwoDimensionTranspose_3 | SessionCreationTime
|---------------------------|--------------------------|--------------------------|--------------------------|--------------------------|
| original | 88.39 ms | 57.80 ms | 53.09 m | 9.41871 s |
| avoid 2 memcpy | 51.52 ms |12.00 m | 8.05 ms | 9.05975 s |
---------
Co-authored-by: Kuan-Yu Lin <kuanyul@qti.qualcomm.com>