Error out of default_collate for lists of unequal size (#38492)
Summary:
Fix issue https://github.com/pytorch/pytorch/issues/23141#
In the below example ```default_collate``` collates each element of the list. Since the second element isn't present in all samples, it is discarded:
```
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
import numpy as np
class CustomDataset(Dataset):
def __len__(self):
return 2
def __getitem__(self, idx):
tmp = {
"foo": np.array([1, 2, 3]),
"bar": ["X"] * (idx+1),
}
return tmp
training = CustomDataset()
for batch in DataLoader(training, batch_size=2):
print(batch)
```
Yields
```
{
'foo': tensor(
[
[1, 2, 3],
[1, 2, 3]
]
),
'bar': [
('X', 'X'),
]
}
```
Based on discussion in the issue, it seems the best course of action is to error out in this case. This seems consistent with what is done for tensor elements, as seen in [TensorShape.cpp line 1066](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/TensorShape.cpp#L1060) which is called when ```torch.stack``` is called. In this PR, I introduce a similar message to error out for lists.
SsnL
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38492
Differential Revision: D21620396
Pulled By: ezyang
fbshipit-source-id: 17f59fbb1ed1f0d9b2185c95b9ebe55ece701b0c