Support double for operator Gemm + fix bug in gemm implementation for cuda, rocm when sizeof(type) != sizeof(float) (#6223)
* Support double for operator Gemm
* fix type size while copying data in gemm operator for GPU
* fix type in gemm implementation for rocm