tensorflow: Getting assertion failed: [Unable to decode bytes as JPEG, PNG, or GIF] when training using tensorflow object detection api

I tried to use Tensorflow Object detection API with my own dataset.
Everything was working just fine until all of a sudden it crashed with the following error messages :

...
INFO:tensorflow:global step 10560: loss = 0.4366 (0.809 sec/step)
INFO:tensorflow:global step 10561: loss = 0.3834 (0.822 sec/step)
INFO:tensorflow:global step 10562: loss = 0.3611 (0.829 sec/step)
INFO:tensorflow:global step 10563: loss = 0.3549 (0.901 sec/step)
INFO:tensorflow:global step 10564: loss = 0.3634 (0.839 sec/step)
INFO:tensorflow:global step 10565: loss = 0.3396 (0.813 sec/step)
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, assertion failed: [Unable to decode bytes as JPEG, PNG, or GIF]
         [[Node: case/If_0/decode_image/cond_jpeg/cond_png/Assert/Assert = Assert[T=[DT_STRING], summarize=3, _device="/job:localhost/replica:0/task:0/cpu:0"](case/If_0/decode_image/cond_jpeg/cond_png/is_gif, case/If_0/decode_image/cond_jpeg/cond_png/Assert/Assert/data_0)]]
INFO:tensorflow:global step 10566: loss = 0.3459 (0.889 sec/step)
INFO:tensorflow:Finished training! Saving model to disk.
Traceback (most recent call last):
  File "train.py", line 198, in <module>
    tf.app.run()
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "train.py", line 194, in main
    worker_job_name, is_chief, FLAGS.train_dir)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\object_detection-0.1-py3.5.egg\object_detection\trainer.py", line 296, in train
    saver=saver)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\contrib\slim\python\slim\learning.py", line 759, in train
    sv.saver.save(sess, sv.save_path, global_step=sv.global_step)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\Lib\contextlib.py", line 66, in __exit__
    next(self.gen)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\training\supervisor.py", line 964, in managed_session
    self.stop(close_summary_writer=close_summary_writer)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\training\supervisor.py", line 792, in stop
    stop_grace_period_secs=self._stop_grace_secs)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\training\coordinator.py", line 389, in join
    six.reraise(*self._exc_info_to_raise)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\six.py", line 686, in reraise
    raise value
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\training\queue_runner_impl.py", line 238, in _run
    enqueue_callable()
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\client\session.py", line 1063, in _single_operation_run
    target_list_as_strings, status, None)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\Lib\contextlib.py", line 66, in __exit__
    next(self.gen)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 466, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [Unable to decode bytes as JPEG, PNG, or GIF]
         [[Node: case/If_0/decode_image/cond_jpeg/cond_png/Assert/Assert = Assert[T=[DT_STRING], summarize=3, _device="/job:localhost/replica:0/task:0/cpu:0"](case/If_0/decode_image/cond_jpeg/cond_png/is_gif, case/If_0/decode_image/cond_jpeg/cond_png/Assert/Assert/data_0)]]

G:\Tensorflow_section\models-master\object_detection>

When I upgraded to the 1.3.0, the error has changed, and this is what I get now:

...
INFO:tensorflow:global step 10635: loss = 0.3392 (0.822 sec/step)
INFO:tensorflow:global step 10636: loss = 0.3529 (0.823 sec/step)
INFO:tensorflow:global step 10637: loss = 0.3305 (0.831 sec/step)
2017-09-14 20:02:02.545415: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\framework\op_kernel.cc:1192] Invalid argument: Shape mismatch in tuple component 16. Expected [1,?,?,3], got [1,240,127,4]
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Shape mismatch in tuple component 16. Expected [1,?,?,3], got [1,240,127,4]
         [[Node: batch/padding_fifo_queue_enqueue = QueueEnqueueV2[Tcomponents=[DT_STRING, DT_INT32, DT_FLOAT, DT_INT32, DT_FLOAT, DT_INT32, DT_INT64, DT_INT32, DT_INT64, DT_INT32, DT_INT64, DT_INT32, DT_BOOL, DT_INT32, DT_BOOL, DT_INT32, DT_FLOAT, DT_INT32, DT_STRING, DT_INT32, DT_STRING, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch/padding_fifo_queue, Reshape_2, Shape_5, SparseToDense_1, Shape_2, Merge_1, Shape, Merge_2, Shape_3, SparseToDense_5, Shape_8, SparseToDense_3, Shape_6, Cast_1, Shape_1, Cast_2, Shape_7, ExpandDims_5, Shape_4, Reshape_5, Shape_10, Reshape_6, Shape_9)]]
INFO:tensorflow:global step 10638: loss = 0.3599 (0.858 sec/step)
INFO:tensorflow:Finished training! Saving model to disk.
Traceback (most recent call last):
  File "train.py", line 198, in <module>
    tf.app.run()
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "train.py", line 194, in main
    worker_job_name, is_chief, FLAGS.train_dir)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\object_detection-0.1-py3.5.egg\object_detection\trainer.py", line 296, in train
    saver=saver)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\contrib\slim\python\slim\learning.py", line 767, in train
    sv.stop(threads, close_summary_writer=True)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\training\supervisor.py", line 792, in stop
    stop_grace_period_secs=self._stop_grace_secs)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\training\coordinator.py", line 389, in join
    six.reraise(*self._exc_info_to_raise)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\six.py", line 686, in reraise
    raise value
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\training\queue_runner_impl.py", line 238, in _run
    enqueue_callable()
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\client\session.py", line 1235, in _single_operation_run
    target_list_as_strings, status, None)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\Lib\contextlib.py", line 66, in __exit__
    next(self.gen)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 466, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape mismatch in tuple component 16. Expected [1,?,?,3], got [1,240,127,4]
         [[Node: batch/padding_fifo_queue_enqueue = QueueEnqueueV2[Tcomponents=[DT_STRING, DT_INT32, DT_FLOAT, DT_INT32, DT_FLOAT, DT_INT32, DT_INT64, DT_INT32, DT_INT64, DT_INT32, DT_INT64, DT_INT32, DT_BOOL, DT_INT32, DT_BOOL, DT_INT32, DT_FLOAT, DT_INT32, DT_STRING, DT_INT32, DT_STRING, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch/padding_fifo_queue, Reshape_2, Shape_5, SparseToDense_1, Shape_2, Merge_1, Shape, Merge_2, Shape_3, SparseToDense_5, Shape_8, SparseToDense_3, Shape_6, Cast_1, Shape_1, Cast_2, Shape_7, ExpandDims_5, Shape_4, Reshape_5, Shape_10, Reshape_6, Shape_9)]]

G:\Tensorflow_section\models-master\object_detection>

I have no idea what is causing the issue, Could this be that some images have wrong extensions? for example, an image which was actually a png with 4 channels, has been saved as a jpg ?! if this is the case, how to spot the faulty image file? or even better why does not TF use the correct type/number of channels by itself? right now, the error is not descriptive enough, it doesn’t give any hint which image file is corrupted or is making the problem. if the cause of these errors is what I pointed out earlier, then they should be caught when the TF Record is being created. or if TF records are not the only means of inputs, then the same mechanism for knowing the culprit needs to be implemented as well Anyway, if all my thoughts are wrong, then I would appreciate any help regarding this issue.

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10 x64 1703, Build 15063.540
  • TensorFlow installed from (source or binary):binary (used pip install )
  • TensorFlow version (use command below): 1.2.1 and 1.3.0
  • Python version: 3.5.3(Anaconda)
  • Bazel version (if compiling from source):
  • CUDA/cuDNN version: Cuda 8.0 /cudnn v5.1 and v6.0 (after upgrading to 1.3.0, cudnnv6.0 is in the PATH)
  • GPU model and memory: GTX-1080 - 8G

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 8
  • Comments: 29

Most upvoted comments

OK , Thanks to God, I could finally figure out what the issue was. it seems in creating TFRecords, only jpeg images are supported and no where in the documentation this is indicated! also when you try to use other types, it does not issue any warnings or doesn’t through any exceptions and therefore people like me lose an immense amount of time debugging something that could be easily spotted and fixed in first place. Anyway, converting all images to jpg solved this weird hellbound issue.

For those who are still having trouble, that is because some of the stupid files disguised themselves as jpg but are not actually jpeg. Create a python file and copy this code(modified from above). (Change the directories if you need to.) This program will pick out files that are not actually jpeg. Delete them then you are good to go. “”"import imghdr import cv2 import os import glob

for folder in [‘train’, ‘test’]: image_path = os.path.join(os.getcwd(), (‘images/’ + folder)) print(image_path) for file in glob.glob(image_path + ‘/*.jpg’): image = cv2.imread(file) file_type = imghdr.what(file) if file_type != ‘jpeg’: print(file + " - invalid - " + str(file_type)) # cv2.imwrite(file, image) “”"

There’s a simple way I fixed this using imghdr and OpenCV. In my case, all the image files had the extension of ‘.jpg’, but some of these extensions were invalid. After running this code, regenerate your tf record file and you should be good to go.

        import imghdr     
        for file in image_file_list:    
            image = cv2.imread(file)
            file_type = imghdr.what(file)  
            if file_type != 'jpeg':  
                print(file +  " - invalid - " +  str(file_type))  
                cv2.imwrite(file, image)```

@Coderx7 I also make a mistake with jpg. What’s the matter

@Coderx7 we need more heroes like you 😃

Oh, Sometimes training or testing data mix of JPEG and PNG format image. You can try this:

    def _parse_function(filename):
         image_string = tf.read_file(filename)
         image_decoded = tf.cond(
             tf.image.is_jpeg(image_string),
             lambda: tf.image.decode_jpeg(image_string, channels=3),
             lambda: tf.image.decode_png(image_string, channels=3))
         image_resized = tf.image.resize_images(image_decoded, [90, 90])
    return image_resized
    
    filenames = ["/var/data/image1.jpg", "/var/data/image2.jpg", ...]
    labels = [0, 37, 29, 1, ...]
    dataset = tf.data.Dataset.from_tensor_slices((filenames, labels))
    dataset = dataset.map(_parse_function)
    dataset = dataset.batch(batch_size)
    iterator = dataset.make_one_shot_iterator()

@Coderx7 Thanks for checking. Those look like entirely reasonable checks on your input data set.

The kinds of checks that you’re asking for seem like reasonable feature requests. I can’t speak for the object detection team about whether they have time to do that kind of work. @tombstone, can you comment?

In the meantime, I’ll mark this as “contributions welcome”.

@cy89 : Thank you very much for your attention, But thats what I did and thats why I think there is a need for better error messages. If there is an issue in the input, this must be caught when a tfrecord is being created. I have checked multiple times that my images are consistent i.e a png is indeed a png and not mistakenly saved as a jpg or vice versa and also their correct depth is retrieved and used. I checked all image headers for such discrepancies and fixed them. yet I get this error. here is the code I used to check each image and retrieve its true type:

private string IsImage(Stream stream)
{
	string jpg = "FFD8";
	string bmp = "424D" ;
	string gif = "474946" ;
	string png = "89504E470D0A1A0A" ;
	string sig = "";

	stream.Seek(0, SeekOrigin.Begin);
	for (int i = 0; i < 8; i++)
	{
		sig += stream.ReadByte().ToString("X2");
		if (sig.Length == 4 && sig == jpg)
		{
			sig = "jpg";
			break;
		}
		else if(sig.Length == 4 && sig == bmp)
		{
			sig = "bmp";
			break;
		}
		else if (sig.Length == 6 && sig == gif)
		{
			sig = "gif";
			break;
		}
		else if (sig.Length == 16 && sig == png)
		{
			sig = "png";
			break;
		}
	}
	return sig;
}

And for image depth( # of channels) I used OpenCV (EmguCV).

I don’t know what else I was supposed to do that I have not done already. that’s why it strikes me as a bug or if its not a bug, its clear that error message is not helping at all and more information needs to be disclosed. yet better, all input related checks need to be done at the time of creating the dataset (i.e. tfrecords for easier error handling or trouble shooting!) right now I’m clueless and I have no idea where to dig in!