onnxruntime
0f8e66d9 - optimization for whisper model with decoder masked multihead attention (#15827)

Commit
2 years ago
optimization for whisper model with decoder masked multihead attention (#15827) * graph tools update * cuda kernel update * operator spec update and implementation update * greed search bug fix on wrong assumption for cross/self attention input length * avoid use of "" name in value info when loading graph which historically in many model
Author
Parents
Loading