onnxruntime
ce01ed02 - Improve LongformerAttention performance: AddBiasTranspose and New weight format (#12448)

Commit
3 years ago
Improve LongformerAttention performance: AddBiasTranspose and New weight format (#12448) * add AddBiasTranspose kernel, new format of weights * Use compact global_q in GEMM * sequence_index from BxS to S; new stream for copy * merge input and output pointers in scratch2 * update default benchmark tests * add new format 0 for weight and bias * avoid integer overflow * check gpu memory * output summary in benchmark * add logging * update unit tests with non empty bias value * add rocblasGemmHelper and rocblasGemmStridedBatchedHelper for Rocm
Author
Parents
Loading