onnxruntime
afa89566 - Using cublasGemmBatchedEx/cublasGemmStridedBatchedEx for training (#4731)

Commit
5 years ago
Using cublasGemmBatchedEx/cublasGemmStridedBatchedEx for training (#4731) * use cublas extenstion API for fp16 * Using cublasGemmBatchedEx/cublasGemmStridedBatchedEx for training To avoid accuracy, the accumulation needs to be done in FP32 for training. Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
Author
Parents
Loading