tensorpack: Exception in thread EnqueueThread QueueInput/input_queue - msgpack exceeds max_bin_len()
1. What you did:
(1) If you’re using examples, what’s the command you run:
python3 train.py --config MODE_MASK=True MODE_FPN=True DATA.BASEDIR="coco" BACKBONE.WEIGHTS=R50.npz
(2) If you’re using examples, have you made any changes to the examples? Paste git status; git diff here:
Changing config to adapt to widerface dataset
2. What you observed:
(1) Include the ENTIRE logs here:
[32m[0515 23:27:44 @base.py:275][0m Start Epoch 15 ...
[32m[0515 23:28:41 @input_source.py:173][0m [4m[5m[31mERR[0m [EnqueueThread] Exception in thread EnqueueThread QueueInput/input_queue:
Traceback (most recent call last):
File "/home/nga/FasterRCNN/tensorpack/input_source/input_source.py", line 161, in run
dp = next(self._itr)
File "/home/nga/FasterRCNN/tensorpack/dataflow/common.py", line 370, in __iter__
for dp in self.ds:
File "/home/nga/FasterRCNN/tensorpack/dataflow/parallel_map.py", line 314, in __iter__
for dp in super(MultiProcessMapDataZMQ, self).__iter__():
File "/home/nga/FasterRCNN/tensorpack/dataflow/parallel_map.py", line 89, in __iter__
for dp in self.get_data_non_strict():
File "/home/nga/FasterRCNN/tensorpack/dataflow/parallel_map.py", line 65, in get_data_non_strict
ret = self._recv()
File "/home/nga/FasterRCNN/tensorpack/dataflow/parallel_map.py", line 309, in _recv
dp = loads(msg[1])
File "/home/nga/FasterRCNN/tensorpack/utils/serialize.py", line 43, in loads_msgpack
max_str_len=MAX_MSGPACK_LEN)
File "/home/nga/.local/lib/python3.6/site-packages/msgpack_numpy.py", line 214, in unpackb
return _unpackb(packed, **kwargs)
File "msgpack/_unpacker.pyx", line 200, in msgpack._unpacker.unpackb
ValueError: 1916481600 exceeds max_bin_len(1000000000)
[32m[0515 23:28:41 @input_source.py:179][0m [EnqueueThread] Thread EnqueueThread QueueInput/input_queue Exited.
[32m[0515 23:28:58 @base.py:291][0m Training was stopped by exception FIFOQueue '_0_QueueInput/input_queue' is closed and has insufficient elements (requested 1, current size 0)
[[node QueueInput/input_deque (defined at /home/nga/FasterRCNN/tensorpack/input_source/input_source.py:272) = QueueDequeueV2[component_types=[DT_FLOAT, DT_INT32, DT_FLOAT, DT_INT32, DT_FLOAT, DT_INT32, DT_FLOAT, DT_INT32, DT_FLOAT, DT_INT32, DT_FLOAT, DT_FLOAT, DT_INT64, DT_UINT8], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](QueueInput/input_queue)]]
Caused by op 'QueueInput/input_deque', defined at:
File "train.py", line 120, in <module>
launch_train_with_config(traincfg, trainer)
File "/home/nga/FasterRCNN/tensorpack/train/interface.py", line 91, in launch_train_with_config
model.build_graph, model.get_optimizer)
File "/home/nga/FasterRCNN/tensorpack/utils/argtools.py", line 176, in wrapper
return func(*args, **kwargs)
File "/home/nga/FasterRCNN/tensorpack/train/tower.py", line 224, in setup_graph
train_callbacks = self._setup_graph(input, get_cost_fn, get_opt_fn)
File "/home/nga/FasterRCNN/tensorpack/train/trainers.py", line 189, in _setup_graph
grad_list = self._builder.call_for_each_tower(tower_fn)
File "/home/nga/FasterRCNN/tensorpack/graph_builder/training.py", line 226, in call_for_each_tower
use_vs=[False] + [True] * (len(self.towers) - 1))
File "/home/nga/FasterRCNN/tensorpack/graph_builder/training.py", line 122, in build_on_towers
return DataParallelBuilder.call_for_each_tower(*args, **kwargs)
File "/home/nga/FasterRCNN/tensorpack/graph_builder/training.py", line 117, in call_for_each_tower
ret.append(func())
File "/home/nga/FasterRCNN/tensorpack/train/tower.py", line 252, in get_grad_fn
inputs = input.get_input_tensors()
File "/home/nga/FasterRCNN/tensorpack/input_source/input_source_base.py", line 83, in get_input_tensors
return self._get_input_tensors()
File "/home/nga/FasterRCNN/tensorpack/input_source/input_source.py", line 272, in _get_input_tensors
ret = self.queue.dequeue(name='input_deque')
File "/home/nga/.local/lib/python3.6/site-packages/tensorflow/python/ops/data_flow_ops.py", line 435, in dequeue
self._queue_ref, self._dtypes, name=name)
File "/home/nga/.local/lib/python3.6/site-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 3741, in queue_dequeue_v2
timeout_ms=timeout_ms, name=name)
File "/home/nga/.local/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/nga/.local/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/home/nga/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/home/nga/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
self._traceback = tf_stack.extract_stack()
OutOfRangeError (see above for traceback): FIFOQueue '_0_QueueInput/input_queue' is closed and has insufficient elements (requested 1, current size 0)
[[node QueueInput/input_deque (defined at /home/nga/FasterRCNN/tensorpack/input_source/input_source.py:272) = QueueDequeueV2[component_types=[DT_FLOAT, DT_INT32, DT_FLOAT, DT_INT32, DT_FLOAT, DT_INT32, DT_FLOAT, DT_INT32, DT_FLOAT, DT_INT32, DT_FLOAT, DT_FLOAT, DT_INT64, DT_UINT8], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](QueueInput/input_queue)]]
(2) Other observations, if any: The error occurs at random epoch, sometimes 12, sometimes 6, 9 and so on.
3. What you expected, if not obvious.
How can I resolve this issue?
4. Your environment:
Python==3.6.7
Tensorpack==0.9.4
TensorFlow==1.12.0
msgpack==0.5.6
1 GPU - GeForce GTX 1080 Free RAM 124.25/125.82 GB
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 21
OMG. Then this is a real issue. For now you can unblock yourself by changing the size limit at: https://github.com/tensorpack/tensorpack/blob/42416945c1e36a5f1d4350ee9c1ae8b134cbe841/tensorpack/utils/serialize.py#L19
I’ll see what’s a better way to fix this.