vision: Inconsistent API for tensor and PIL image for CenterCrop

🐛 Bug

CenterCrop transform crops image center. However, when crop size is smaller than image size(in any dimension), the results are different for PIL.Image input and tensor input.

For PIL.Image, it does padding, while for tensor, the results are completely wrong, due to incorrect crop location computation(negative position values computed).

To Reproduce

Steps to reproduce the behavior:

import torch
import torchvision.transforms as transforms
import torchvision.transforms.functional as TF

pil_x = TF.to_pil_image(torch.randn(3, 24, 24))  
transforms.CenterCrop(50)(pil_x).size  # PIL image size is 50x50

tensor_x = torch.randn(3, 24, 24)
transforms.CenterCrop(50)(tensor_x).shape  # Tensor shape is torch.Size([3, 12, 12])

Expected behavior

Mainly, API should be consistent for both inputs or there should be two different methods for PIL & Tensor input(which is IMO less appealing). Side note - docs should be updated for what happens when crop_size is greater than image size. This condition, I believe is missing in documentation of other crop methods as well.

Environment

Collecting environment information...
PyTorch version: 1.7.0
Is debug build: True
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: version 3.10.2

Python version: 3.8 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: Quadro T2000
Nvidia driver version: 450.102.04
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.2
[pip3] torch==1.7.0
[pip3] torchaudio==0.7.0a0+ac17b64
[pip3] torchvision==0.8.1
[conda] blas                      1.0                         mkl  
[conda] cudatoolkit               10.2.89              hfd86e86_1  
[conda] mkl                       2020.2                      256  
[conda] mkl-service               2.3.0            py38he904b0f_0  
[conda] mkl_fft                   1.2.0            py38h23d657b_0  
[conda] mkl_random                1.1.1            py38h0573a6f_0  
[conda] numpy                     1.19.2           py38h54aff64_0  
[conda] numpy-base                1.19.2           py38hfa32c7d_0  
[conda] pytorch                   1.7.0           py3.8_cuda10.2.89_cudnn7.6.5_0    pytorch
[conda] torchaudio                0.7.0                      py38    pytorch
[conda] torchvision               0.8.1                py38_cu102    pytorch

cc @vfdev-5

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 2
  • Comments: 15 (11 by maintainers)

Most upvoted comments

Sounds good! Please, take a look at our contributing guide which may help you with the dev setup etc. A draft PR is also good such that we can iterate over the implementation if needed and do not hesitate to ask questions here for details.

No worries and thank you for pad_symmetric, it would work fine here.

Tested on PIL version ‘6.2.1’(found this is too old, so retested on newer versions 😄) and ‘7.2.0’. And yep, there was padding in both.

Regarding ETA - I will hopefully make changes in 3/4 days(its only few hour work but working first time on vision lib, so might get delayed in setup by a little).

@vfdev-5 First, I am really sorry if I am mistaken/confused and would really appreciate for your time and explaination here.

No, PIL is incorrect also as data is still 100x100. I mean it didn’t pad the image.

There is padding happening to output and not to the input to CenterCrop method, which is expected, the input should not be changed. I say should after glossing over various transform, and I didnt see any inplace ops to modify input.

The first image I1 shows a sample image from PASCAL and CenterCrop output with cropsize larger than image size. It does the padding to output.

Screenshot from 2021-01-26 16-59-01

The second image I2 shows that even when cropsize is smaller than imagesize, input image to CenterCrop method is not modified. I only ran this test to see if my assumption was correct regarding transform not changing input.

Screenshot from 2021-01-26 17-17-54

@vfdev-5 You mentioned:-

print(transforms.CenterCrop(150)(pil_x).size) # PIL image size is 50x50

Did you mean “150x150” here?

I mean, something like

# x = PIL_image or tensor
with pytest.raises(ValueError, match=r"Crop size should be smaller than input size"):
    transforms.CenterCrop(150)(x)

PIL seems to give correct output so far, so raising error will cause backward incompatibility. Will that be fine?

No, PIL is incorrect also as data is still 100x100. I mean it didn’t pad the image. IMO, there is no BC change with raising an exception for both cases…

Fun bug with PIL and Tensors. I think it would make sense to raise an error saying that we can not crop to a larger size.

import torch
import torchvision.transforms as transforms
import torchvision.transforms.functional as TF
import numpy as np


pil_x = TF.to_pil_image(127 * torch.ones(3, 100, 100))  
print(transforms.CenterCrop(150)(pil_x).size)  # PIL image size is 50x50
print(np.asarray(pil_x).shape)

tensor_x = torch.randn(3, 100, 100)
print(transforms.CenterCrop(150)(tensor_x).shape)
> (150, 150)
> (100, 100, 3)
> torch.Size([3, 24, 24])

EDIT: My bad about pillow output size as 100x100. It is 150x150 and output image is padded.

@saurabheights Thanks for reporting and for making the issue reproducible. That’s definitely a bug. We are happy to accept a PR that solves the problem.

Concerning the proposal of extending the functionality and adjusting the API, I think this should be discussed on a separate issue because we need to assess the backward compatibility.

@vfdev-5 Any thoughts on this?

One last thing: I can work on its PR. Let me know if its alright by you.

That sounds great! Let’s wait for the green light from a member.

One last thing: I can work on its PR. Let me know if its alright by you.

P.S. A lot of the functionality needed is similar to RandomCrop, which provides flexibility of padding, so would be a good idea to consider integration of those options to CenterCrop, since beside the offset computation, rest cropping part would be same.

Applying RandomCrop on both PIL image and tensor image shows same content(but with different offset).

transforms.RandomCrop(50, pad_if_needed=True)(pil_x).show()
transforms.ToPILImage()((transforms.RandomCrop(50, pad_if_needed=True)(tensor_x))).show()

Big HP Fan 😄, He-Who-Must-Not-Be-Named.

Just when I thought I missed one spot… #3295 I added this in #3224 for now, what do you think? cc @datumbox