[ROCm] enable scaled_gemm (#117822)
scaled_gemm for ROCm using hipblaslt. As of ROCm 6.0, HIPBLASLT_MATMUL_DESC_AMAX_D_POINTER is not supported. A work-around is provided, performing the absmax operation on the output buffer, but this results in some loss of accuracy for the absmax result. For this reason the feature should be considered beta/preview.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117822
Approved by: https://github.com/jianyuh, https://github.com/xw285cornell