MatMul Gradient optimization for dB when B's is 2D tensor (#4899)
* Optimized MatMulGrad for dB when B's shape is 2D
* Refactor for ConstantScalarNode
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>