tensorflow-deeplab-resnet: Errors while traying to fine_tune the model
Hello,
I pass all steps and used fine_tune.py to fine-tune the model. I got some errors. Number of classes are 2. I used this command: python fine_tune.py --not-restore-last
I used these parameters:
IMG_MEAN = np.array((145.2201, 119.0066, 97.9356), dtype=np.float32)
BATCH_SIZE = 1
DATA_DIRECTORY = '/home/hesam/Desktop/2/train'
DATA_LIST_PATH = 'data/train.txt'
IGNORE_LABEL = 255
INPUT_SIZE = '321,321'
LEARNING_RATE = 1e-4
NUM_CLASSES = 2
NUM_STEPS = 20000
RANDOM_SEED = 1234
RESTORE_FROM = 'data/deeplab_resnet.ckpt'
SAVE_NUM_IMAGES = 2
SAVE_PRED_EVERY = 100
SNAPSHOT_DIR = 'data'
and
# colour map
label_colours = [(0,0,0),(131,0,5)]
The errors:
(tensorflow) hesam@hesam-MS-7994:~/Desktop/tensorflow-deeplab-resnet-master$ python fine_tune.py --not-restore-last
Couldn't import dot_parser, loading of dot files will not be possible.
2017-04-26 11:26:40.077467: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-04-26 11:26:40.077492: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-04-26 11:26:40.077496: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-04-26 11:26:40.077499: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-04-26 11:26:40.077502: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-04-26 11:26:40.195856: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-04-26 11:26:40.196757: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 0 with properties:
name: GeForce GTX 1060 6GB
major: 6 minor: 1 memoryClockRate (GHz) 1.759
pciBusID 0000:01:00.0
Total memory: 5.93GiB
Free memory: 5.54GiB
2017-04-26 11:26:40.196771: I tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0
2017-04-26 11:26:40.196775: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 0: Y
2017-04-26 11:26:40.196780: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0)
Restored model parameters from data/deeplab_resnet.ckpt
2017-04-26 11:26:46.111049: W tensorflow/core/framework/op_kernel.cc:1152] Not found: /home/hesam/Desktop/2/train/labels/0039.png
2017-04-26 11:26:46.112367: W tensorflow/core/framework/op_kernel.cc:1152] Not found: /home/hesam/Desktop/2/train/labels/0039.png
[[Node: create_inputs/ReadFile_1 = ReadFile[_device="/job:localhost/replica:0/task:0/cpu:0"](create_inputs/input_producer/Gather_1)]]
2017-04-26 11:26:46.222843: W tensorflow/core/framework/op_kernel.cc:1152] Out of range: FIFOQueue '_1_create_inputs/batch/fifo_queue' is closed and has insufficient elements (requested 1, current size 0)
[[Node: create_inputs/batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_UINT8], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](create_inputs/batch/fifo_queue, create_inputs/batch/n)]]
2017-04-26 11:26:46.223115: W tensorflow/core/framework/op_kernel.cc:1152] Out of range: FIFOQueue '_1_create_inputs/batch/fifo_queue' is closed and has insufficient elements (requested 1, current size 0)
[[Node: create_inputs/batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_UINT8], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](create_inputs/batch/fifo_queue, create_inputs/batch/n)]]
Traceback (most recent call last):
File "fine_tune.py", line 207, in <module>
main()
File "fine_tune.py", line 196, in main
loss_value, images, labels, preds, summary, _ = sess.run([reduced_loss, image_batch, label_batch, pred, total_summary, optim])
File "/home/hesam/Desktop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 778, in run
run_metadata_ptr)
File "/home/hesam/Desktop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 982, in _run
feed_dict_string, options, run_metadata)
File "/home/hesam/Desktop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1032, in _do_run
target_list, options, run_metadata)
File "/home/hesam/Desktop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1052, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: FIFOQueue '_1_create_inputs/batch/fifo_queue' is closed and has insufficient elements (requested 1, current size 0)
[[Node: create_inputs/batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_UINT8], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](create_inputs/batch/fifo_queue, create_inputs/batch/n)]]
Caused by op u'create_inputs/batch', defined at:
File "fine_tune.py", line 207, in <module>
main()
File "fine_tune.py", line 125, in main
image_batch, label_batch = reader.dequeue(args.batch_size)
File "/home/hesam/Desktop/tensorflow-deeplab-resnet-master/deeplab_resnet/image_reader.py", line 179, in dequeue
num_elements)
File "/home/hesam/Desktop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/training/input.py", line 917, in batch
name=name)
File "/home/hesam/Desktop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/training/input.py", line 712, in _batch
dequeued = queue.dequeue_many(batch_size, name=name)
File "/home/hesam/Desktop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/ops/data_flow_ops.py", line 458, in dequeue_many
self._queue_ref, n=n, component_types=self._dtypes, name=name)
File "/home/hesam/Desktop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 1328, in _queue_dequeue_many_v2
timeout_ms=timeout_ms, name=name)
File "/home/hesam/Desktop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
op_def=op_def)
File "/home/hesam/Desktop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/hesam/Desktop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1228, in __init__
self._traceback = _extract_stack()
OutOfRangeError (see above for traceback): FIFOQueue '_1_create_inputs/batch/fifo_queue' is closed and has insufficient elements (requested 1, current size 0)
[[Node: create_inputs/batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_UINT8], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](create_inputs/batch/fifo_queue, create_inputs/batch/n)]]
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 28 (11 by maintainers)
https://github.com/DrSleep/tensorflow-deeplab-resnet/issues/75
TL;DR: there was a bug, which is now fixed. Please clone the repository again
Please take a look here: https://github.com/martinkersner/train-DeepLab#data-conversions (the second paragraph). Also this function should be useful to perform the conversion: https://github.com/martinkersner/train-DeepLab/blob/master/utils.py#L91
Your label masks are 3-d tensors: i.e., there are values like 0, 128, 192, 224. Instead, each 3d vector should be converted to a number. You need to save your annotations without colourmap, as also written in the DeepLab FAQ: http://liangchiehchen.com/projects/DeepLab_FAQ.html
No need to rename. Just use the path ‘model.ckpt-100’: it will fetch up everything. For example, take a look here for an analogous problem: https://github.com/DrSleep/tensorflow-deeplab-resnet/issues/36
Looks like the files are not presented: