Multicast and discovery

10 posts / 0 new
Last post
Offline
Last seen: 6 years 10 months ago
Joined: 04/04/2014
Posts: 27
Multicast and discovery

Hello,

we are running multiple process communicating over DDS in following network:

  • Most computers are in the same subnet and they can communicate over multicast - pc-subnet1, pc-subnet2, pc-subnet3, pc-subnet4
  • There are several computers in another subnet and they cannot be reached via multicast - pc-outside1, pc-outside2

So far my optimal configuration was following:

  • Processes in the same subnet get default addresses (multicast, shmem, localhost) and hostnames outside subnet
    • NDDS_DISCOVERY_PEERS=localhost, shmem://, pc-outside1, pc-outside2, builtin.udpv4://239.255.0.1
  • Processes outside subnet get default addresses without multicast and hostnames outside subnet:
    • NDDS_DISCOVERY_PEERS=localhost, shmem://, pc-outside1, pc-outside2

With this configuration all processes can discover each other and successfully communicate. Is there more optimal configuration possible?

The problems come if a process is started outside subnet with multicast address (even if it's another address, e.g. builtin.udpv4://224.1.1.1). This process can discover all domain participants but unable to get any information about publications and subscriptions from another processes running in the subnet. Publications and subscriptions are received only from another processes running outside subnet (without multicast address specified). RTI Admin Console show the same behavior:

  • if it's started outside subnet without multicast address - all topics are correctly shown
  • if it's started outside subnet with multicast address - only topics from processes outside subnet are shown

Is it desired behavior? Is it possible to have multicast discovery address and still get data from another subnet? e.g. such configuration would be desired:

  • for pc-subnet1, pc-subnet2, pc-subnet3, pc-subnet4 (all in one subnet):
    • NDDS_DISCOVERY_PEERS=localhost, shmem://, pc-outside1, pc-outside2, builtin.udpv4://239.255.0.1
  • for pc-outside1, pc-outside2 (all in another subnet)
    • NDDS_DISCOVERY_PEERS=localhost, shmem://, pc-subnet1, pc-subnet2, pc-subnet3, pc-subnet4, builtin.udpv4://239.255.0.2

I couldn't manage to get such configuration working due to behavior described above.

P.S. accept_unknown_peers is 1, all another QoS options are unchanged.

Thanks!

Offline
Last seen: 6 years 10 months ago
Joined: 04/04/2014
Posts: 27

I think I've found explanation to this behavior

 Two applications are running on separate machines. Application A has the address of Application B in its initial peers list and has a multicast receive address set. Application B has default initial peers.

  • Why can’t they communicate?: In some network configurations this setup will work just fine. However, in a situation where the two machines are in different multicast networks, discovery will not complete. This happens because Application A will send unicast discovery to Application B. Application B will detect that Application A, like itself, has been setup to receive multicast traffic and will attempt to continue discovery over multicast because this is most efficient. Because the two applications are in different multicast networks, the multicast traffic will be blocked between the two applications, unbeknownst to them, and discovery will fail.

  • Solution: Disable multicast on Application A and/or Application B by clearing their multicast_receive_addresses list to force discovery to go over unicast.

The remaining question is - does it apply if application A and application B have different multicast_receive_address?

rip
rip's picture
Offline
Last seen: 3 weeks 4 days ago
Joined: 04/06/2012
Posts: 324

tl;dr:  the issue isn't the multicast receive address in use, it is in how external network equipment is configured.

The outgoing discovery traffic includes a return route "use this address".  So Application B should respond on A's multicast receive address, assuming that A is using B's unicast address for discovery.  This does not mean that they will automatically communicate, because the thing that prevents A from seing B on the same multicast receive address, may prevent A from seeing B on a different multicast receive address.  It's not the 'address', it's the 'multicast'.

Multicast routing is configured with a certain number of "hops" (which is what "the multicast traffic will be blocked" is alluding to).  An intervening switch subtracts an arbitrary number (generally 1) from the packet's "hop" count, and is allowed to drop multicast packets once the hops number has decreased to some arbitrary limit (generally 0, but both of these 'arbitrary' numbers can be configured due to policies in place).   The two applications can be on the same multicast address or differnt multicast address, and still not see each other because the hops count decreased to below the limit in a switch. 

Because multicast hops is the main issue here, B responding to A on A's (different) multicast-receive-address may still be blocked by intervening equipment.

rip

Offline
Last seen: 6 years 10 months ago
Joined: 04/04/2014
Posts: 27

Thanks for explanation! could you please confirm my understanding of the A<->B discovery case?

B has multicast enabled with any multicast_received_address and after B receives discovery information from A with another multicast address, B still tries to communicate via multicast?


I have one more question regarding dds.transport.UDPv4.builtin.parent.deny_multicast_interfaces_list and discovery

  • A is configured for discovery over multicast address and has address of B
  • B has 2 network interfaces and is reachable from A over en0 interface via unicast only. en1 interface should be used for multicast communication with another computers, so B has another multicast address set and I additionally set dds.transport.UDPv4.builtin.parent.deny_multicast_interfaces_list to en0

A and B cannot communicate in this configuration and I assume that B tries to reach A over en0 via multicast.

What does  dds.transport.UDPv4.builtin.parent.deny_multicast_interfaces_list do? Does it effect only data communication or discovery as well? My expectation would be that this option can be used to disable multicast discovery on one of the interfaces but it doesn't seem to work.

 

 

 

rip
rip's picture
Offline
Last seen: 3 weeks 4 days ago
Joined: 04/06/2012
Posts: 324

Well, here we are getting deep into networking neepery (A wonderful word that brings together geek, nerd and expert knowledge of a single subject), at which level I do not exist, sorry :)
 .

B will respond on A's reported return-route address (which might be either unicast or multicast, depending on configuration).  This is similar behavior to using the new (in 5.1.0) UDP NAT-traversal configuration, by the way.  If you look at the discussion for that in the What's New and Table 15-2 of the user's manual.

A says "This is the address you can reach me on" (indeed you could mis-use the dds.transport.UDPv4.builtin.public_address as described by the NAT traversal property), but this assumes that the address it is giving is in fact .reachable. by the participants it is talking to.  In the extreme case, you can misconfigure A to announce it's return address as "localhost", ie 127.0.0.1 -- when B tries to respond to that, its responses never leave the B machine -- and so A and B only talk if they are on the same machine (but that's what the shmem built-in is for).

So for B to not be able to reach A is a problem with either A's configuration, or the network infrastructure between B and A.  I would restate your bullets as:

  • A is configured for discovery over multicast address, it uses a [multicast|unicast] return route address, and it also has the address of B
  • B has 2 network interfaces and is reachable from A over en0 interface via unicast only. en1 interface should be used for multicast communication with another computers, so B has another multicast address set and I additionally set dds.transport.UDPv4.builtin.parent.deny_multicast_interfaces_list to en0

So again, it depends on what A is reporting as its return address.  If it is reporting a multicast address, and it is not reachable via en1, then B's Discovery responses will be denied on en0 by the local configuartion, and on en1 by the external network infrastructure (switches, routers, etc). 

My expectation is that:

  • A announces Discovery on Multicast, and then on Unicast-to-B. It's return address in both cases is a Multicast address.  There is no multicast routing of its address to B's subnet.
  • B does not see the Multicast announcement.  It sees the unicast announcement, and attempts to complete Discovery on Multicast address supplied by A.  With deny_multicast_interfaces_list set to en0, the middleware will respond on A's Multicast response address, on en1. 
  • As before, the en1 interface and downstream network infrastructure do not have a suitable route to A's Multicast response address, and so A is not reached and Discovery is not satisfied.
  • Likewise, B is announcing Discovery on it's own Multicast address on en1, which is still not routable to A. 
  • A never receive's B's announcements or responses, so does not know to attempt to complete Discovery.

Can you clarify what your expectations are wrt deny_multicast_interfaces_list, what you are seeing, and why you think this isn't working as expected?

r

Offline
Last seen: 6 years 10 months ago
Joined: 04/04/2014
Posts: 27

Your explanation describes the situation very well and is more consistent than my expectation. My expectation was coming from the situation when 2 not connected networks are used simultaneously and was that discovery packets try to be responded on the same interface they are received, so that B receives discovery packet from A on interface en0, checks whether multicast is enabled for interface en0 and responses accordingly via multicast or unicast.

That B tries to reach A via every available interface and tries first multicast-enabled interface explains all the phenomena I was observing.

So summarizing how discovery works, the way to configure participant B from my example is to disable multicast in that participant completely and specify address of B in discovery_peers of another participants? In this case participant B will be discovered via unicast and response via unicast, right? Another solution is probably to check and adjust routing tables on participant B host to ensure that specific multicast addresses cannot be sent over en1.

Thank you for extensive description of how discovery works!

rip
rip's picture
Offline
Last seen: 3 weeks 4 days ago
Joined: 04/06/2012
Posts: 324

We don't have a setting for controlling outgoing choice of Interface -- the network stack will choose one or the other and this is outside the control of the middleware.  (Though, you can use the allow/deny_interface_list settings as per this page:  http://community.rti.com/kb/how-do-i-restrict-rti-connext-use-only-subset-interfaces), but that's still not fine grain control.  You can prevent DDS from using eth0 or eth1 to get all data to go out on the other.  But if both are enabled, then the OS/Network Stack will pick whichever one is better (as per some algorithm, which might just be "the first one that works").

I still think the problem is in how A is configured, not in anything in B (B should simply be reacting to what it sees from A).  I'm going to get another engineer to look at it and correct/clarify my statements.

 

Offline
Last seen: 6 years 10 months ago
Joined: 04/04/2014
Posts: 27

My assumption that it's related to the B configuration was based on two tests, performed under condition that A and B cannot reach each other over multicast:

  1. A has multicast and address of B, B has only address of B -> A and B can communicate
  2. A has multicast and address of B, B has multicast and address of B -> A and B cannot communicate

In both cases A has the same configuration and sends discovery information via unicast to B. According to the description how discovery works I would assume that B responses via unicast (multicast is disabled) in 1st test and via multicast (multicast is enabled due to multicast in discovery peers) in 2nd test. I think that A sends a participant discovery paket with both  metatraffic_unicast_locators and  metatraffic_multicast_locators and B uses one of these fields.

rip
rip's picture
Offline
Last seen: 3 weeks 4 days ago
Joined: 04/06/2012
Posts: 324

This is true, but (what I've since found out, and) what may be getting in the way is that B will use the multicast locators if they are offered, and in this case ignore the unicast ones. 

So A does send both its multicast address (because it has one), and its unicast address, and since B can use multicast, it selects the multicast locator and sends its response (which is not routable, apparently).

Would that explain the behavior you are seeing?

 

Offline
Last seen: 6 years 10 months ago
Joined: 04/04/2014
Posts: 27

Yes, exactly :)