onnxruntime
ce01ed02 - Improve LongformerAttention performance: AddBiasTranspose and New weight format (#12448)

Commit

3 years ago

Improve LongformerAttention performance: AddBiasTranspose and New weight format (#12448) * add AddBiasTranspose kernel, new format of weights * Use compact global_q in GEMM * sequence_index from BxS to S; new stream for copy * merge input and output pointers in scratch2 * update default benchmark tests * add new format 0 for weight and bias * avoid integer overflow * check gpu memory * output summary in benchmark * add logging * update unit tests with non empty bias value * add rocblasGemmHelper and rocblasGemmStridedBatchedHelper for Rocm

References

#12448 - Improve LongformerAttention performance

Author

tianleiwu

Parents

7df2e8c5

onnxruntime ce01ed02 - Improve LongformerAttention performance: AddBiasTranspose and New weight format (#12448)

onnxruntime
ce01ed02 - Improve LongformerAttention performance: AddBiasTranspose and New weight format (#12448)