FastMaskRCNN: TypeError: long() argument must be a string or a number, not 'JpegImageFile'

when I run :python download_and_convert_data.py `>> Converting image 23751/82783 shard 11

Converting image 23801/82783 shard 11 Converting image 23851/82783 shard 11 None Annotations data/coco/train2014/COCO_train2014_000000167118.jpg Traceback (most recent call last): File “download_and_convert_data.py”, line 36, in <module> tf.app.run() File “/mnt/data1/daniel/tensorflow/_python_build/tensorflow/python/platform/app.py”, line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File “download_and_convert_data.py”, line 30, in main download_and_convert_coco.run(FLAGS.dataset_dir) File “/mnt/data1/daniel/codes/FastMaskRCNN/libs/datasets/download_and_convert_coco.py”, line 338, in run ‘train2014’) File “/mnt/data1/daniel/codes/FastMaskRCNN/libs/datasets/download_and_convert_coco.py”, line 299, in _add_to_tfrecord img = img.astype(np.uint8) TypeError: long() argument must be a string or a number, not ‘JpegImageFile’`

why this happened?

About this issue

  • Original URL
  • State: open
  • Created 7 years ago
  • Comments: 25 (2 by maintainers)

Most upvoted comments

An idea for further investigation:

  1. Try to replace Image.open with cv2.imread, to do so you must install opencv (pip install opencv-python)

If this fixes the error there is an issue with PIL alternatively you could also try it with scipy to load the images

  1. If this does not help make a try except block around the code and check which images give you the exception, either if it is all images or just some or even just one image.

  2. Update all libs etc.

@kevinkit Good idea about the cv2.imread – be aware though that OpenCV will load the image in BGR byte order rather than RGB, so you’ll need to switch the byte order, and perhaps make a copy of the re-strided array just in case there are any downstream functions that are expecting contiguous data:

img = cv2.imread("image.jpg").astype(np.uint8)[:,:,::-1]
img = np.ascontiguousarray(img)

Note that although OpenCV does provide a separate image I/O library, the imread/imsave functions in scipy are just fairly thin wrappers around the PIL/Pillow library so you would expect the same error as before:

>>> from scipy.misc import imread
>>> imread("truncated.jpg")
array(<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=637x800 at 0x11CB5FFD0>, dtype=object)

@Designbook1 Hope you’ve solved the problem, please do let us know.

@Designbook1 Sounds like the input JPEG image is corrupted (possibly just truncated). The imaging library can identify the file type as JPEG, but it can’t read the data, so it can’t convert the JpegImageFile into a numpy array.

Compare:

>>> im = Image.open("image.jpg")
>>> im
<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=637x800 at 0x1086AFF50>
>>> np.array(im, dtype=np.uint8)
array([[[105, 100,  94],
        [107, 102,  96],
        [109, 104,  98],
        ...

versus

>>> im = Image.open("truncated.jpg")
>>> im
<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=637x800 at 0x107073810>
>>> np.array(im, dtype=np.uint8)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: long() argument must be a string or a number, not 'JpegImageFile'

May be easiest just re-downloading the images. If you’re feeling particularly motivated you could use (say) djpeg to find which images are broken.

A nice feature request would be a checksum to check if the downloaded data is ok.

The error comes from the fact that if you open a file with

Image.open(...)

It creates an Image object which is from the PIL-library, which is called ‘JpegImageFile’ . The command img.astype(np.uint8) refers to a numpy array. So it happens because at the point this command is called img is not a numpy array but a JpegImagFile

I also have this problem, and the file is different from @Designbook1 (it’s around 70,000 and his is around 23851). I don’t believe the downloaded file has problems, but during the unzip process my disk is full so I have to switch to another disk which might cause some of the images to be corrupted - so I just unzip the data again and it looks good now.

@kevinkit Yeah, I guess, but it’s pretty easy to run md5sum on the downloads. I’d prefer the project to focus on the tricky groundbreaking stuff 😃

For what it’s worth, I report the following MD5 sums for the big zip files:

5750999c8c964077e3c81581170be65b  captions_train-val2014.zip
59582776b8dd745d649cd249ada5acf7  instances_train-val2014.zip
926b9df843c698817ee62e0e049e3753  person_keypoints_trainval2014.zip
0da8c0bd3d6becc4dcb32757491aca88  train2014.zip
a3d79f5ed8d289b7a7554ce06a5782b3  val2014.zip