Optimize CUDA Kernel for 3D and 4D Transpose #8928
Optimize Transpose120
9b70a80c
Optimize Transpose102
a7c3fc0f
Generalize Transpose0123 for more input shapes
f73fece9
SherlockNoMad
dismissed their stale review
via f73fece9
4 years ago
Fix build error
dfec92fb
Add debug log
c04302ca
Fix failing cases
5efdbda9
Add logging
0fd67d19
Relax check to run more cases
7f1ee3ac
Fix bug
42cf1a73
All test passing
c5202510
Add Transpose3D test cases
3299087c
adjuest order
f1289ca3
clean up
b7c20c40
Fix build
a5dc0d49
update rocm kernel
3322210e
ytaous
approved these changes
on 2021-09-13
SherlockNoMad
changed the title Optimize Transpose102 and Transpose120 for CUDA Optimize CUDA Kernel for 3D and 4D Transpose 4 years ago
SherlockNoMad
deleted the bahuang/optimize_transpose branch 4 years ago
Assignees
No one assigned
Labels
training
core runtime
Login to write a write a comment.
Login via GitHub