Add FusedMatMul contrib op (#5213)
* bug fix transformer
* fuse cpu kernel for transposescalematmul and matmul
* fuse transpose_scale_matmul cpu kernel with matmul
* fix test
* Add FusedMatMul Contrib Op
* fix test
* fix typo
* plus more updates per review