onnxruntime
b9d39e34 - Fix cuda Transpose bug 16039 (#16042)

Commit
2 years ago
Fix cuda Transpose bug 16039 (#16042) ### Description Transpose will fail in cuda for FLOAT16 for tensors larger than 1048x1048 due to our optimized case exceeding the cuda grid size of 65536. The fix is to just use our regular cuda transpose in these cases. ### Motivation and Context https://github.com/microsoft/onnxruntime/issues/16039
Author
Parents
Loading