Contiguify bias in slow_conv_transpose3d kernel (#84125)
Users never run into this because PyTorch now comes with cudnn by
default and cudnn has a better conv_transpose implementation. However we
seem to test without cudnn in our CI; and also, ROCM goes down this
path.
The .contiguous() call does not regress anything because previously it
was a runtime error. Because this kernel is the "slow conv transpose3d
kernel", we don't care much for its performance.
Test Plan:
- wait for tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84125
Approved by: https://github.com/ngimel