Conv transpose/backward split 32bit (#31510)
Summary:
Basically the same as https://github.com/pytorch/pytorch/pull/31379 except for that I write a separate function `split_batch_dim_to_32bit_out` for the logic. This function could also be used for convolution forward, and I will rebase this PR after https://github.com/pytorch/pytorch/issues/31379 get merged and then change `raw_cudnn_convolution_forward_out` to use `split_batch_dim_to_32bit_out` here.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31510
Differential Revision: D19210563
Pulled By: ngimel
fbshipit-source-id: e20bb82b6360aa2c0e449e127188c93f44e1e9b4