onnxruntime
0f8e66d9 - optimization for whisper model with decoder masked multihead attention (#15827)

Commit

3 years ago

optimization for whisper model with decoder masked multihead attention (#15827) * graph tools update * cuda kernel update * operator spec update and implementation update * greed search bug fix on wrong assumption for cross/self attention input length * avoid use of "" name in value info when loading graph which historically in many model

References

#15827 - optimization for whisper model with decoder masked multihead attention

Author

zhanghuanrong

Parents

be6c0bb5

onnxruntime 0f8e66d9 - optimization for whisper model with decoder masked multihead attention (#15827)

onnxruntime
0f8e66d9 - optimization for whisper model with decoder masked multihead attention (#15827)