transformers
30d8919a - in the resize() function in image_transforms.py, the line 267: (#20728)

Commit
3 years ago
in the resize() function in image_transforms.py, the line 267: (#20728) `image = to_channel_dimension_format(image, ChannelDimension.LAST)` is redundant as this same conversion is also applied in to_pil_image(). This redundant call actually makes the training fail in rare cases. The problem can be reproduced with the following code snippet: ``` from transformers.models.clip import CLIPFeatureExtractor vision_processor = CLIPFeatureExtractor.from_pretrained('openai/clip-vit-large-patch14') images = [ torch.rand(size=(3, 2, 10), dtype=torch.float), torch.rand(size=(3, 10, 1), dtype=torch.float), torch.rand(size=(3, 1, 10), dtype=torch.float) ] for image in images: processed_image = vision_processor(images=image, return_tensors="pt")['pixel_values'] print(processed_image.shape) assert processed_image.shape == torch.Size([1, 3, 224, 224]) ``` The last image has a height of 1 pixel. The second call to to_channel_dimesion_format() will transpose the image, and the height dimension is wrongly treated as the channels dimension afterwards. Because of this, the following normalize() step will result in an exception.
Author
Parents
Loading