Multicast Route Affects RELIABLE Reliability

1 post / 0 new
Last seen: 6 months 3 weeks ago
Joined: 04/19/2018
Posts: 2
Multicast Route Affects RELIABLE Reliability

I have three computers communicating via a VPX switch in a VPX chassis.  They are running CentOS 7.  On one computer, I have configured the default route for multicast addresses, as follows: route add netmask eth0 .  On computer 1, I am running wireshark, on computer 2 I am running Application A, and on computer 3, I am running Application B.  Application A is running on the computer with the default route for multicast.  Applications A and B are modern C++ applications built against RTI Connext 5.3.1 for the target. 

I am using Generic.StrictReliable QoS, with no amendments.  When Application B comes up, I see its IGMPv3 Membership Report come through on Wireshark.  When Application A comes up, I do not see it.  Regardless, I see an indication that Application B has received the initialization message from Application A, and that A has started sending the periodic status message.  After 10 seconds, Application A fails with a Timeout Error, waiting for acknowledgement of the first status message.

If I re-run, and with my own instrumentation, I see the intermessage time for a different heartbeat message boucing around from its nominal 200ms time to several seconds. I understand that in StrictReliable, the write call blocks until it is acknowledged.  On a separate, non-production target with the same operating system, The system runs nominally.  If I run wireshark on the board running Application, I see the IGMPv3 Membership Report for that board, and the report for the board running Application B.  If I delete the default route for multicast addresses, my application runs nominally, but can't communicate with other multicast enabled components of the application.  I understand that I can add a route for my specific multicast groups.  It seems that the outbound messages are fine, but the heartbeat responses of the reliable qos are being lost somehow.

I saw this article, , which mentioned the existence of asymmetric discovery.  I am interested to know if and how I can instrument the software, or use RTI tools to diagnose this problem.  I try to avoid unverified failures, and would like to make a positive conclusion about why I was seeing these effects.


UPDATE: Any route in the routing table blows up RELIABLE reliability, causing timeout errors.  Even route add netmask eth0.