Optimize quantized LSTM (#8634)

Commit

4 years ago

Optimize quantized LSTM (#8634) * optimize some lstm gate computation. Remove no need string constructions. * change gcc optimization flags for computation bound logics in rnn_helpers * better qgemm for M=1 * Some improve on avx512 * add condition to limit GCC related marcros * Correct QGemm assembly for M=1 AVX2 optimization to pass mlas_test. * Fix rnn_helper build issue for wasm. * better asm code here according to feedbacks. * Remove customized vectorize and unroll option for GCC. Using restrict on some function to help GCC to correctly vectorize it. Rewrite clip_add_bias() to let GCC correctly vectorize it. * Better restrict semantic for merge_lstm_gates_to_memory() by adding in_place(). Add MSC __restrict for the clip_add_bias() mthod to vectorize correctly. * Force CI restart as it stucked by the onnxruntime-python-checks-ci-pipeline which can not restart.

References

#8634 - Optimize quantized LSTM

Author

zhanghuanrong

Parents

caacf249

onnxruntime 76dfe810 - Optimize quantized LSTM (#8634)

onnxruntime
76dfe810 - Optimize quantized LSTM (#8634)