Share more constant initializers (#15461)
### Share more constant initializers.
`ConstantSharing` transformer originally only handle single value
initializer (scalar or 1D).
This PR tried to share more cases to make common subexpression
elimination transformer to remove more duplicated nodes.
Originally, we used a single
vector<std::variant<float,half,int32,int64>> to store different scalar
values. In this PR, we create a unordered map with its key being
data_type + rank + element count, and its value is a vector of
`InitializerValue`.
For one specific initializer, if it fulfils the condition, then finally
will find the corresponding vector of `InitializerValue` by its
<data_type + rank + element count>, then search from the vector whether
the constant tensor already exist or not. After that, a value id is
returned, which will be combined together with <data_type + rank +
element count> to form the pattern key to decide which tensor to reuse
(legacy code).
### Motivation and Context
One example we see here is:
```mermaid
stateDiagram
[*] --> LayerNorm(b,s,64)
LayerNorm(b,s,64) --> Reshape1
Shape1_Const[b*s,64] --> Reshape1
LayerNorm(b,s,64) --> Reshape2
Shape2_Const[b*s,64] --> Reshape2
Reshape1 --> AttentionSubGraph
Reshape2 --> Add
AttentionSubGraph--> Add
Add --> [*]
```
Ideally CommonSubexpressionElimination can remove one of `Reshape1` and
`Reshape2`, while since `Shape1_Const` and `Shape2_Const` are different
NodeArg*, so it did not remove the duplication.
This is an example: removing the duplication will bring more
opportunities to apply graph transformations.