ros2cli: ros2 topic info -v prints incorrect QoS info for macOS + CycloneDDS

Bug report

Required Info:

  • Operating System:
    • macOS 10.14 Mojave
  • Installation type:
  • Version or commit hash:
    • Foxy prerelease
  • DDS implementation:
    • CycloneDDS
  • Client library (if applicable):
    • N/A

Steps to reproduce issue

ros2 topic info -v does not print correct QoS information for a topic.

➜  ~ ros2 topic pub /talker --qos-durability volatile std_msgs/String "data: Hello World volatile"

➜  ~ ros2 topic info -v /talker
Type: std_msgs/msg/String

Publisher count: 1

Node name: _CREATED_BY_BARE_DDS_APP_
Node namespace: _CREATED_BY_BARE_DDS_APP_
Topic type: std_msgs/msg/String
Endpoint type: PUBLISHER
GID: a3.6b.10.01.32.8a.d9.a4.bb.0c.3b.46.00.00.08.03.00.00.00.00.00.00.00.00
QoS profile:
  Reliability: RMW_QOS_POLICY_RELIABILITY_RELIABLE
  Durability: RMW_QOS_POLICY_DURABILITY_TRANSIENT_LOCAL
  Lifespan: 2147483651294967295 nanoseconds
  Deadline: 2147483651294967295 nanoseconds
  Liveliness: RMW_QOS_POLICY_LIVELINESS_AUTOMATIC
  Liveliness lease duration: 2147483651294967295 nanoseconds

Subscription count: 0

Expected behavior

Durability value printed should be: Durability: RMW_QOS_POLICY_DURABILITY_VOLATILE

Actual behavior

Durability value printed is: Durability: RMW_QOS_POLICY_DURABILITY_TRANSIENT_LOCAL

Additional information

These commands appear to work as expected on Linux and Windows for CycloneDDS.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 38 (16 by maintainers)

Most upvoted comments

@clalancette

If @eboasson agrees, then I think we’ll want to open up an issue there. Thanks.

Anyway, I think it’s time to pull eProsima in. I have created an issue on Fast-DDS.

@iuhilnehc-ynos perhaps you need not do what I asked just now: after I published the comment, I realised that I could try this particular experiment of mixing a Cyclone DDS-based publisher with a Fast-RTPS-based ros2 cli. I get exactly the same result. That’s good news.

I’ve checked with Wireshark: image

Cyclone only publishes QoS settings that are different from the default. In section 9.6.2.2.1 the DDSI-RTPS specification states that for parameters (e.g., durability QoS, a.k.a. PID_DURABILITY) missing from a discovery message, the default should be applied, and then references the DDS specification for the default

image

which is volatile per section 2.1.3 of the DCPS spec, so Cyclone DDS is entirely correct:

image

and Fast-RTPS or its RMW layer is misinterpreting the discovery data.

I suspect that when @fujitatomoya reproduced it, the ros2 daemon was running using Fast-RTPS. That one then stores the incorrect discovery data and regurgitates it to “ros2 topic info” even if that one is running Cyclone. Restarting the machine/container could easily have resulted in using Cyclone DDS for the daemon.

And if instead you use Cyclone DDS for the daemon, even Fast-RTPS gets it right:

$ RMW_IMPLEMENTATION=rmw_cyclonedds_cpp ros2 topic pub /talker1234 --qos-durability volatile  std_msgs/String "data: Hello World volatile" > /dev/null 2>&1 &

$ ros2 daemon stop
$ RMW_IMPLEMENTATION=rmw_cyclonedds_cpp ros2 topic info -v /talker1234

Type: std_msgs/msg/String

Publisher count: 1

Node name: _ros2cli_28883
Node namespace: /
Topic type: std_msgs/msg/String
Endpoint type: PUBLISHER
GID: 5c.18.10.01.85.70.a9.07.7d.e5.91.87.00.00.08.03.00.00.00.00.00.00.00.00
QoS profile:
  Reliability: RMW_QOS_POLICY_RELIABILITY_RELIABLE
  Durability: RMW_QOS_POLICY_DURABILITY_VOLATILE               # good
  Lifespan: 9223372036854775807 nanoseconds
  Deadline: 9223372036854775807 nanoseconds
  Liveliness: RMW_QOS_POLICY_LIVELINESS_AUTOMATIC
  Liveliness lease duration: 9223372036854775807 nanoseconds

Subscription count: 0

$ RMW_IMPLEMENTATION=rmw_fastrtps_cpp ros2 topic info -v /talker1234

Type: std_msgs/msg/String

Publisher count: 1

Node name: _ros2cli_28883
Node namespace: /
Topic type: std_msgs/msg/String
Endpoint type: PUBLISHER
GID: 5c.18.10.01.85.70.a9.07.7d.e5.91.87.00.00.08.03.00.00.00.00.00.00.00.00
QoS profile:
  Reliability: RMW_QOS_POLICY_RELIABILITY_RELIABLE
  Durability: RMW_QOS_POLICY_DURABILITY_VOLATILE               # good
  Lifespan: 9223372036854775807 nanoseconds
  Deadline: 9223372036854775807 nanoseconds
  Liveliness: RMW_QOS_POLICY_LIVELINESS_AUTOMATIC
  Liveliness lease duration: 9223372036854775807 nanoseconds

Subscription count: 0

$ ros2 daemon stop
The daemon has been stopped
$ RMW_IMPLEMENTATION=rmw_fastrtps_cpp ros2 topic info -v /talker1234  

Type: std_msgs/msg/String

Publisher count: 1

Node name: _CREATED_BY_BARE_DDS_APP_
Node namespace: _CREATED_BY_BARE_DDS_APP_
Topic type: std_msgs/msg/String
Endpoint type: PUBLISHER
GID: 5c.18.10.01.85.70.a9.07.7d.e5.91.87.00.00.08.03.00.00.00.00.00.00.00.00
QoS profile:
  Reliability: RMW_QOS_POLICY_RELIABILITY_RELIABLE
  Durability: RMW_QOS_POLICY_DURABILITY_TRANSIENT_LOCAL               # bad
  Lifespan: 2147483651294967295 nanoseconds
  Deadline: 2147483651294967295 nanoseconds
  Liveliness: RMW_QOS_POLICY_LIVELINESS_AUTOMATIC
  Liveliness lease duration: 2147483651294967295 nanoseconds

Subscription count: 0

$ RMW_IMPLEMENTATION=rmw_cyclonedds_cpp ros2 topic info -v /talker1234

Type: std_msgs/msg/String

Publisher count: 1

Node name: _CREATED_BY_BARE_DDS_APP_
Node namespace: _CREATED_BY_BARE_DDS_APP_
Topic type: std_msgs/msg/String
Endpoint type: PUBLISHER
GID: 5c.18.10.01.85.70.a9.07.7d.e5.91.87.00.00.08.03.00.00.00.00.00.00.00.00
QoS profile:
  Reliability: RMW_QOS_POLICY_RELIABILITY_RELIABLE
  Durability: RMW_QOS_POLICY_DURABILITY_TRANSIENT_LOCAL               # bad
  Lifespan: 2147483651294967295 nanoseconds
  Deadline: 2147483651294967295 nanoseconds
  Liveliness: RMW_QOS_POLICY_LIVELINESS_AUTOMATIC
  Liveliness lease duration: 2147483651294967295 nanoseconds

Subscription count: 0

@fujitatomoya

if someone does double check in macOS to make sure since this is registered for macOS at 1st.

I have confirmed all these issues will be fixed in macOS after https://github.com/eProsima/Fast-DDS/pull/1384 is merged. (https://github.com/eProsima/Fast-DDS/pull/1382 is already merged.)

chenlh@lihuideMacBook-Pro ROS2 % ros2 topic info -v /talker1234
Type: std_msgs/msg/String

Publisher count: 1

Node name: _ros2cli_2279
Node namespace: /
Topic type: std_msgs/msg/String
Endpoint type: PUBLISHER
GID: 9c.d0.10.01.ef.00.1e.2b.04.29.a2.c8.00.00.08.03.00.00.00.00.00.00.00.00
QoS profile:
  Reliability: RMW_QOS_POLICY_RELIABILITY_RELIABLE
  Durability: RMW_QOS_POLICY_DURABILITY_VOLATILE
  Lifespan: 2147483651294967295 nanoseconds
  Deadline: 2147483651294967295 nanoseconds
  Liveliness: RMW_QOS_POLICY_LIVELINESS_AUTOMATIC
  Liveliness lease duration: 2147483651294967295 nanoseconds

Subscription count: 0

@iuhilnehc-ynos

FastDDS expected that it will get fastdds::dds::PID_PARTICIPANT_GUID message while getting DATA(w) information, but CycloneDDS only send fastdds::dds::PID_PARTICIPANT_GUID at DATA(p) not DATA(w). (It seems there is no such item ‘must’, ‘optional’ for these message-ids in the spec.)

This is an interesting. Quoting from section 9.6.2.2 in the DDSI-RTPS spec:

For optimization, implementations of the protocol shall not include a parameter in the Data submessage if it contains information that is redundant with other parameters already present in that same Data submessage. As a result of this optimization an implementation shall omit the serialization of the parameters listed in Table 9.10.

The participant GUID is entirely redundant because it is always the same as the reader/writer GUID with the entity id component replaced by 0x1c1. So in my view the normative text says the participant GUID should be omitted and the referenced table is therefore simply an incomplete list of “forbidden” items.

I guess one could hold a different opinion and argue that the intent behind that paragraph is that only those listed in table 9.10 should be left. But the fact of the matter is that neither OpenSplice nor Cyclone DDS has ever published it in the reader/writer information for a decade without it causing any interoperability issues, and I have furthermore checked some packet captures related to investigating interoperability issues in that decade that show Connext and CoreDX also leave it out. (That also means I would expect the same all-zero GUID to show when using Connext instead of Cyclone in this experiment.)

Finally, if eProsima is of the opinion that the participant GUID must be present in the reader/writer information, they should reject the discover data as invalid — maybe I should try fuzzing it? — and not use a nonsensical default value instead.

When the CREATED_BY_BARE_DDS_APP problem is observed, process_discovery_info RTPSParticipantKey() is all zero (confirmed). It should be something like rmw_gid_t 1 64 16 1 -46 123 40 76 -73 127 -106 -118 0 0 1 -63 0 0 0 0 0 0 0 0 which is the same rmw_gid_t with node_listener msg.gid.data.

and the questions is why? @MiguelCompany could you kindly share your thought? this is really getting into dds implementation.

No worries, @iuhilnehc-ynos, there’re just too many details to be on top of them all.

FastDDS does have a problem. If WriterProxyData::readFromCDRMessage not received durability, its QoS will be WriterQos m_qos;

WriterQos::WriterQos()
{
    this->m_reliability.kind = RELIABLE_RELIABILITY_QOS;
    this->m_durability.kind = TRANSIENT_LOCAL_DURABILITY_QOS;    // not use default value here ?
}

Yes, that totally explains it

But I can’t find the information in DDSI-RTPS specification about In section 9.6.2.2.1 the DDSI-RTPS specification states that for parameters (e.g., durability QoS, a.k.a. PID_DURABILITY) missing from a discovery message, the default should be applied

It is this paragraph:

For backwards compatibility, both subspaces are subdivided again. If a ParameterId is expected, but not present, the protocol will assume the default value. Similarly, if a ParameterId is present but not recognized, the protocol will either skip and ignore the parameter or treat the parameter as an incompatible QoS. The actual behavior depends on the ParameterId value, see Table 9.11.

(Page 169 of DDSI-RTPS 2.3, emphasis mine)

Table 9.11 then says “see DDS specification”, which is the big table in the DDS 1.4 spec starting on page 2-104.

or Let me put it another way, why cyclonedds send PID_RELIABILITY even if it’s the default value ?

It is because it isn’t actually set to the default value: the reliability QoS is a pair of kind (best-effort or reliable) and a “max blocking time” that says for how long the writer should block when resource limits prevent it from completing the write operation. The DDS default for “max blocking time” is 100ms, but the Cyclone RMW layer sets it to ∞ and that’s the reason it is included.

I now see that Wireshark doesn’t print the “max blocking time” in the table. It doesn’t affect reader/writer matching, so in that respect it is not very important, but it is a required part of the wire format so it is kinda odd that Wireshark omits it. (Required because the DDSI-RTPS spec says the type of the “reliability” parameter in the discovery is ReliabilityQosPolicy, in turn defined in the DDS spec, although with a footnote that “The encoding of DDS::ReliabilityQoSPolicy::kind is defined by RTPS::ReliabilityKind_t (9.3.2)”.

image

It’s the “ff ff ff 7f ff ff ff ff” bit in the highlighted bytes.

@fujitatomoya

could someone try to confirm above procedure to reproduce the issue just in case?

Yes, I can reproduce this issue by your steps. (I agree we can use a different RMW to get information from others) I can also use the following steps to reproduce it in a container.

  • terminal 1
$ export RMW_IMPLEMENTATION=rmw_cyclonedds_cpp
$ ros2 topic pub /talker1234 --qos-durability volatile  std_msgs/String "data: Hello World volatile" > /dev/null 2>&1
  • terminal 2
$ ros2 daemon stop
$ export RMW_IMPLEMENTATION=rmw_cyclonedds_cpp

$ ros2 topic info -v /talker1234

1599459365.320665 [0]       ros2: using network interface eno1 (udp/192.168.0.61) selected arbitrarily from: eno1, docker0
Type: std_msgs/msg/String

Publisher count: 1

Node name: _ros2cli_783
Node namespace: /
Topic type: std_msgs/msg/String
Endpoint type: PUBLISHER
GID: 49.a5.10.01.7d.f9.95.aa.aa.a2.1a.b4.00.00.08.03.00.00.00.00.00.00.00.00
QoS profile:
  Reliability: RMW_QOS_POLICY_RELIABILITY_RELIABLE
  Durability: RMW_QOS_POLICY_DURABILITY_VOLATILE                     # good, it's correct.
  Lifespan: 9223372036854775807 nanoseconds
  Deadline: 9223372036854775807 nanoseconds
  Liveliness: RMW_QOS_POLICY_LIVELINESS_AUTOMATIC
  Liveliness lease duration: 9223372036854775807 nanoseconds

Subscription count: 0


$ ros2 daemon stop
$ export RMW_IMPLEMENTATION=rmw_fastrtps_cpp

$ ros2 topic info -v /talker1234
Type: std_msgs/msg/String

Publisher count: 1

Node name: _CREATED_BY_BARE_DDS_APP_
Node namespace: _CREATED_BY_BARE_DDS_APP_
Topic type: std_msgs/msg/String
Endpoint type: PUBLISHER
GID: 49.a5.10.01.7d.f9.95.aa.aa.a2.1a.b4.00.00.08.03.00.00.00.00.00.00.00.00
QoS profile:
  Reliability: RMW_QOS_POLICY_RELIABILITY_RELIABLE
  Durability: RMW_QOS_POLICY_DURABILITY_TRANSIENT_LOCAL             # bad.
  Lifespan: 2147483651294967295 nanoseconds
  Deadline: 2147483651294967295 nanoseconds
  Liveliness: RMW_QOS_POLICY_LIVELINESS_AUTOMATIC
  Liveliness lease duration: 2147483651294967295 nanoseconds

Subscription count: 0

@eboasson

# CYCLONEDDS_URI='<Tr><V>fine</><Out>cdds.log.${CYCLONEDDS_PID}</>'
# ros2 topic pub /talker --qos-durability volatile std_msgs/String "data: Hello World volatile" & sleep 2 ; ros2 topic info -v /talker ; kill % ; wait
# grep -E '(SEDP|WRITER).*topic_name="rt/talker"' cdds.log.*
grep: cdds.log.*: No such file or directory

I cannot get the log file cdds.log…did i miss something?

Foolish me … trying to be helpful giving copy-paste-ready lines and then forgetting the export CYCLONEDDS_URI part …

after restart container, I am no longer able to reproduce this issue with #525 (comment). i don’t know what to tell, but sorry 😢 I’ll try to find another environment.

Hopefully you can find one without much effort 🤞