ifm3d: Publishers Occasionally Hang
Hello,
We’ve encountered anther issue in which we are unable to get any data out of the ifm publishers. This has come up a few times in the field and I created some test code to reproduce it. I just repeatedly initializes the ifm3d-ros node and waits to receive a distance message. Eventually, it’ll hang waiting for the message. The test output looks something like this:
...
[ifm_left/ifm_left-41] process has died [pid 890, exit code -9, cmd /opt/shining_software/catkin_ws/devel/lib/ifm3d/ifm3d_node distance:=/ifm_left/depth_with_confidence __name:=ifm_left __log:=/home/brain/.ros/log/744bdbb0-cf05-11e7-9a7f-0010340b6800/ifm_left-ifm_left-41.log].
log file: /home/brain/.ros/log/744bdbb0-cf05-11e7-9a7f-0010340b6800/ifm_left-ifm_left-41*.log
process[ifm_left/ifm_left-42]: started with pid [937]
[ INFO] [1511305225.314623226]: Running dtors...
[ INFO] [1511305225.314707989]: Initializing camera...
[ INFO] [1511305226.327200369]: Initializing framegrabber...
[ INFO] [1511305227.381932637]: Initializing image buffer...
[ INFO] [1511305227.550609292]: Got unit vectors, restarting framegrabber with mask: 15
[ INFO] [1511305227.550649134]: Running dtors...
[ INFO] [1511305227.551026321]: Initializing camera...
[ INFO] [1511305228.563687840]: Initializing framegrabber...
[ INFO] [1511305229.253269551]: Initializing image buffer...
Received depth message. Shape: 0 0
[ifm_left/ifm_left-42] process has died [pid 937, exit code -9, cmd /opt/shining_software/catkin_ws/devel/lib/ifm3d/ifm3d_node distance:=/ifm_left/depth_with_confidence __name:=ifm_left __log:=/home/brain/.ros/log/744bdbb0-cf05-11e7-9a7f-0010340b6800/ifm_left-ifm_left-42.log].
log file: /home/brain/.ros/log/744bdbb0-cf05-11e7-9a7f-0010340b6800/ifm_left-ifm_left-42*.log
process[ifm_left/ifm_left-43]: started with pid [984]
[ INFO] [1511305231.366369760]: Running dtors...
[ INFO] [1511305231.366427715]: Initializing camera...
[ INFO] [1511305232.379364764]: Initializing framegrabber...
[ INFO] [1511305233.429639553]: Initializing image buffer...
[ INFO] [1511305233.586629652]: Got unit vectors, restarting framegrabber with mask: 15
[ INFO] [1511305233.586670046]: Running dtors...
[ INFO] [1511305233.587010254]: Initializing camera...
[ INFO] [1511305234.599095939]: Initializing framegrabber...
[ INFO] [1511305235.301061760]: Initializing image buffer...
Received depth message. Shape: 0 0
[ifm_left/ifm_left-43] process has died [pid 984, exit code -9, cmd /opt/shining_software/catkin_ws/devel/lib/ifm3d/ifm3d_node distance:=/ifm_left/depth_with_confidence __name:=ifm_left __log:=/home/brain/.ros/log/744bdbb0-cf05-11e7-9a7f-0010340b6800/ifm_left-ifm_left-43.log].
log file: /home/brain/.ros/log/744bdbb0-cf05-11e7-9a7f-0010340b6800/ifm_left-ifm_left-43*.log
process[ifm_left/ifm_left-44]: started with pid [1041]
[ INFO] [1511305237.421985366]: Running dtors...
[ INFO] [1511305237.422060482]: Initializing camera...
[ INFO] [1511305238.435424341]: Initializing framegrabber...
[ INFO] [1511305239.444415053]: Initializing image buffer...
Traceback (most recent call last):
File "ifm_parameter_tests.py", line 120, in <module>
receive_depth_image_test(base_topic)
File "ifm_parameter_tests.py", line 115, in receive_depth_image_test
msg = next(depth_msgs)
File "ifm_parameter_tests.py", line 61, in gen_msgs
yield receive_one(topic, msg_class, 10.0)
File "ifm_parameter_tests.py", line 54, in receive_one
assert await_pred(container.deref, timeout=timeout), "Never received msg from %s after %s seconds" % (topic, timeout)
AssertionError: Never received msg from ifm_left/depth_with_confidence after 10.0 seconds
[ifm_left/ifm_left-44] killing on exit
[rospy.internal][WARNING][2017-11-21 23:00:47,406] Unknown error initiating TCP/IP socket to localhost.localdomain:58023 (http://localhost.localdomain:46241/): Traceback (most recent call last):
File "/opt/ros/indigo/lib/python2.7/dist-packages/rospy/impl/tcpros_base.py", line 557, in connect
self.read_header()
File "/opt/ros/indigo/lib/python2.7/dist-packages/rospy/impl/tcpros_base.py", line 619, in read_header
self._validate_header(read_ros_handshake_header(sock, self.read_buff, self.protocol.buff_size))
File "/opt/ros/indigo/lib/python2.7/dist-packages/rosgraph/network.py", line 363, in read_ros_handshake_header
d = sock.recv(buff_size)
error: [Errno 104] Connection reset by peer
[ifm_left/ifm_left-44] escalating to SIGTERM
It’s pretty infrequent, but it does consistently hang eventually on my machine. The process seems to hang after “Initializing image buffer…”. Here’s the test code:
#!/usr/bin/env python
import roslaunch
import rospy
import os
import signal
import time
from sensor_msgs.msg import Image
from multiprocessing import Lock
hostname = "192.168.1.69"
port = 80
base_topic = "ifm_left"
class Atom(object):
# Don't trust assignment thread safety.
def __init__(self, v):
self._lock = Lock()
self._value = v
def reset(self, v):
with self._lock:
self._value = v
def deref(self):
with self._lock:
return self._value
def kill_nodes(launcher, nodes):
for node in nodes:
os.kill(node.pid, signal.SIGKILL)
time.sleep(2.0)
def await_pred(pred, timeout):
start_time = time.time()
while True:
if pred():
return True
dt = time.time() - start_time
if dt > timeout:
return False
def receive_one(topic, msg_class, timeout):
"""
Creates a subscriber, waits to receive a message from the given topic/msg_class,
then unregisters the subscriber. Returns the received message.
"""
container = Atom(None)
sub = rospy.Subscriber(topic, Image, container.reset)
assert await_pred(container.deref, timeout=timeout), "Never received msg from %s after %s seconds" % (topic, timeout)
sub.unregister()
return container.deref()
def gen_msgs(topic, msg_class):
while True:
yield receive_one(topic, msg_class, 10.0)
def launch_ifm(launcher, base_topic):
rospy.set_param(os.path.join(base_topic, base_topic),
{"frame_id_base": base_topic,
"ip": hostname,
"password": "",
"publish_viz_images": False,
"schema_mask": 15,
"timeout_millis": 500,
"timoute_tolerance_secs": 5.0,
"xmlrpc_port": port,
"assume_sw_triggered": False})
node = roslaunch.core.Node(
package="ifm3d",
node_type="ifm3d_node",
name=base_topic,
namespace=base_topic,
output="screen",
respawn=False,
required=False,
remap_args=[['distance', "/%s/depth_with_confidence" % base_topic]]
)
return launcher.launch(node)
def receive_depth_image_test(base_topic):
"""
1. Create a ros launcher instance.
2. Repeatedly:
- Launch ifm node.
- Create subscriber, receive one depth message, then unsubscribe.
- kill ros node.
This test will eventually fail with a timeout error waiting for the
depth message on my machine.
"""
launcher = roslaunch.scriptapi.ROSLaunch()
launcher.start()
# Launch the ros node.
node = launch_ifm(launcher, base_topic)
topic = os.path.join(base_topic, "depth_with_confidence")
depth_msgs = gen_msgs(topic, Image)
msg = next(depth_msgs)
while True:
# Note: the first couple depth messages received after registering a
# subscriber are empty. This is another, probably separate, issue.
print "Received depth message. Shape: ", msg.height, msg.width
kill_nodes(launcher, [node])
node = launch_ifm(launcher, base_topic)
msg = next(depth_msgs)
if __name__ == "__main__":
rospy.init_node("UnitVectorsTest")
receive_depth_image_test(base_topic)
This issue hasn’t come up when testing the ifm3d module from cpp, so my guess is it’s at the ros node level, but idk. Finally, you may also notice the first couple depth messages received after registering a subscriber are empty, but I believe that is a separate issue.
Thanks, John
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 26 (13 by maintainers)
ifm3d-logs.zip I spent the afternoon looking at this and I believe I know what is causing the issue(s).
The short version is, sometimes, the O3X seems to be returning an image chunk header (the leading 48 bytes) filled with zeros rather than the actual meta-information that we expect when fetching the Unit Vectors over XML-RPC. Because this header data was coming back as all zeros, the image chunk parser in the ifm3d framegrabber entered an infinite loop while trying to advance to the next image chunk and parse the rest of the data. I’ve pushed a new/debugging branch to ifm3d here which now traps this error - i.e., it will no longer run in an infinite loop; however, when the unit vector image chunk header is all zeros, it will not know how to advance to the next image chunk and ultimately cause a framegrabber timeout. Looking at the diff for the new branch should make it obvious as to where I am trapping this issue.
I’ve attached two log files with the IFM3D verbosity turned way up that illustrates this problem. In both the “OK” and “FAIL” logs, I have serialized the byte buffer representing the Unit Vectors. You will note on the “FAIL” log, the first 48 bytes (the v2 chunk header) is all zeros. In both logs the serialization of the unit vector buffer follows the message that says: “Unit vectors:” to make your searching easier. I will note that it appears (I did not validate the entire buffer) that except for the header info, the payload of the unit vectors seems to be consistent across the “OK” and “FAIL” logs.
@graugans Can you investigate on the camera side what would cause the Unit Vector image chunk header to come back as all zeros. The test script that @cartesian-theatrics sent triggers this error within a couple of minutes (at most).
@cartesian-theatrics Your test script was helpful to repeatably trigger this error and quickly find where the issue was. Thanks for that. Also, you noted in your message above:
I think you are correct. It is a separate issue. I believe I have that one resolved in the ROS node on this branch Again, thanks for pointing this out.
Once we get this all sorted out: ifm3d, ifm3d-ros, camera firmware we will come up with the proper fix and merge all the changes in. However for now, you can feel free to clone my dev branch for this issue.
Hi Graugans,
No problem. Do you have ROS installed and a ROS workspace created with the ifm3d-ros module in it, i.e. are you able to launch the ifm3d-ros module? If not, you’ll want to install ROS (we’re running ros indigo) and go through this tutorial to create a catkin workspace:
http://wiki.ros.org/action/fullsearch/catkin/Tutorials/create_a_workspace?action=fullsearch&context=180&value=linkto%3A"catkin%2FTutorials%2Fcreate_a_workspace"
Assuming you do have ROS installed and are able to launch ifm3d-ros in some capacity, you should be able to runn the script above by: