[Nested Tensor] fix from_padded bug (#84217)
Fixes #84082
Explained in the issue that the problem was arising from grad being not contiguous and the fast kernel not handiling this case gracefully. The other thing I can do is add a contiguous call to https://github.com/pytorch/pytorch/blob/d144594512e10ab2a9625347816c2dee1fb55667/aten/src/ATen/native/nested/cuda/NestedTensorTransformerFunctions.cpp#L45
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84217
Approved by: https://github.com/albanD