Support Graph Input and Initializer for GatherToSplit Fusion (#18412)
Support graph input and initializer for GatherToSplit fusion. Previously
the fusion requires Gather nodes consume some other node which cannot be
graph input or initializer.
This helps some model training with such case so that we will not have
GatherGrad in the final graph. GatherGrad is super inefficient in kernel
implementation.