FastMaskRCNN: TypeError: long() argument must be a string or a number, not 'JpegImageFile'
when I run :python download_and_convert_data.py `>> Converting image 23751/82783 shard 11
Converting image 23801/82783 shard 11 Converting image 23851/82783 shard 11 None Annotations data/coco/train2014/COCO_train2014_000000167118.jpg Traceback (most recent call last): File “download_and_convert_data.py”, line 36, in <module> tf.app.run() File “/mnt/data1/daniel/tensorflow/_python_build/tensorflow/python/platform/app.py”, line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File “download_and_convert_data.py”, line 30, in main download_and_convert_coco.run(FLAGS.dataset_dir) File “/mnt/data1/daniel/codes/FastMaskRCNN/libs/datasets/download_and_convert_coco.py”, line 338, in run ‘train2014’) File “/mnt/data1/daniel/codes/FastMaskRCNN/libs/datasets/download_and_convert_coco.py”, line 299, in _add_to_tfrecord img = img.astype(np.uint8) TypeError: long() argument must be a string or a number, not ‘JpegImageFile’`
why this happened?
About this issue
- Original URL
- State: open
- Created 7 years ago
- Comments: 25 (2 by maintainers)
An idea for further investigation:
Image.open
withcv2.imread
, to do so you must install opencv (pip install opencv-python)If this fixes the error there is an issue with PIL alternatively you could also try it with scipy to load the images
If this does not help make a try except block around the code and check which images give you the exception, either if it is all images or just some or even just one image.
Update all libs etc.
@kevinkit Good idea about the cv2.imread – be aware though that OpenCV will load the image in BGR byte order rather than RGB, so you’ll need to switch the byte order, and perhaps make a copy of the re-strided array just in case there are any downstream functions that are expecting contiguous data:
Note that although OpenCV does provide a separate image I/O library, the imread/imsave functions in scipy are just fairly thin wrappers around the PIL/Pillow library so you would expect the same error as before:
@Designbook1 Hope you’ve solved the problem, please do let us know.
@Designbook1 Sounds like the input JPEG image is corrupted (possibly just truncated). The imaging library can identify the file type as JPEG, but it can’t read the data, so it can’t convert the JpegImageFile into a numpy array.
Compare:
versus
May be easiest just re-downloading the images. If you’re feeling particularly motivated you could use (say) djpeg to find which images are broken.
A nice feature request would be a checksum to check if the downloaded data is ok.
The error comes from the fact that if you open a file with
Image.open(...)
It creates an Image object which is from the PIL-library, which is called ‘JpegImageFile’ . The command
img.astype(np.uint8)
refers to a numpy array. So it happens because at the point this command is called img is not a numpy array but a JpegImagFileI also have this problem, and the file is different from @Designbook1 (it’s around 70,000 and his is around 23851). I don’t believe the downloaded file has problems, but during the unzip process my disk is full so I have to switch to another disk which might cause some of the images to be corrupted - so I just unzip the data again and it looks good now.
@kevinkit Yeah, I guess, but it’s pretty easy to run md5sum on the downloads. I’d prefer the project to focus on the tricky groundbreaking stuff 😃
For what it’s worth, I report the following MD5 sums for the big zip files: