Hello,
we are running multiple process communicating over DDS in following network:
- Most computers are in the same subnet and they can communicate over multicast - pc-subnet1, pc-subnet2, pc-subnet3, pc-subnet4
- There are several computers in another subnet and they cannot be reached via multicast - pc-outside1, pc-outside2
So far my optimal configuration was following:
- Processes in the same subnet get default addresses (multicast, shmem, localhost) and hostnames outside subnet
- NDDS_DISCOVERY_PEERS=localhost, shmem://, pc-outside1, pc-outside2, builtin.udpv4://239.255.0.1
- Processes outside subnet get default addresses without multicast and hostnames outside subnet:
- NDDS_DISCOVERY_PEERS=localhost, shmem://, pc-outside1, pc-outside2
With this configuration all processes can discover each other and successfully communicate. Is there more optimal configuration possible?
The problems come if a process is started outside subnet with multicast address (even if it's another address, e.g. builtin.udpv4://224.1.1.1). This process can discover all domain participants but unable to get any information about publications and subscriptions from another processes running in the subnet. Publications and subscriptions are received only from another processes running outside subnet (without multicast address specified). RTI Admin Console show the same behavior:
- if it's started outside subnet without multicast address - all topics are correctly shown
- if it's started outside subnet with multicast address - only topics from processes outside subnet are shown
Is it desired behavior? Is it possible to have multicast discovery address and still get data from another subnet? e.g. such configuration would be desired:
- for pc-subnet1, pc-subnet2, pc-subnet3, pc-subnet4 (all in one subnet):
- NDDS_DISCOVERY_PEERS=localhost, shmem://, pc-outside1, pc-outside2, builtin.udpv4://239.255.0.1
- for pc-outside1, pc-outside2 (all in another subnet)
- NDDS_DISCOVERY_PEERS=localhost, shmem://, pc-subnet1, pc-subnet2, pc-subnet3, pc-subnet4, builtin.udpv4://239.255.0.2
I couldn't manage to get such configuration working due to behavior described above.
P.S. accept_unknown_peers is 1, all another QoS options are unchanged.
Thanks!
I think I've found explanation to this behavior
The remaining question is - does it apply if application A and application B have different multicast_receive_address?
tl;dr: the issue isn't the multicast receive address in use, it is in how external network equipment is configured.
The outgoing discovery traffic includes a return route "use this address". So Application B should respond on A's multicast receive address, assuming that A is using B's unicast address for discovery. This does not mean that they will automatically communicate, because the thing that prevents A from seing B on the same multicast receive address, may prevent A from seeing B on a different multicast receive address. It's not the 'address', it's the 'multicast'.
Multicast routing is configured with a certain number of "hops" (which is what "the multicast traffic will be blocked" is alluding to). An intervening switch subtracts an arbitrary number (generally 1) from the packet's "hop" count, and is allowed to drop multicast packets once the hops number has decreased to some arbitrary limit (generally 0, but both of these 'arbitrary' numbers can be configured due to policies in place). The two applications can be on the same multicast address or differnt multicast address, and still not see each other because the hops count decreased to below the limit in a switch.
Because multicast hops is the main issue here, B responding to A on A's (different) multicast-receive-address may still be blocked by intervening equipment.
rip
Thanks for explanation! could you please confirm my understanding of the A<->B discovery case?
B has multicast enabled with any multicast_received_address and after B receives discovery information from A with another multicast address, B still tries to communicate via multicast?
I have one more question regarding dds.transport.UDPv4.builtin.parent.deny_multicast_interfaces_list and discovery
A and B cannot communicate in this configuration and I assume that B tries to reach A over en0 via multicast.
What does
dds.transport.UDPv4.builtin.parent.deny_multicast_interfaces_list
do? Does it effect only data communication or discovery as well? My expectation would be that this option can be used to disable multicast discovery on one of the interfaces but it doesn't seem to work.Well, here we are getting deep into networking neepery (A wonderful word that brings together geek, nerd and expert knowledge of a single subject), at which level I do not exist, sorry :)
.
B will respond on A's reported return-route address (which might be either unicast or multicast, depending on configuration). This is similar behavior to using the new (in 5.1.0) UDP NAT-traversal configuration, by the way. If you look at the discussion for that in the What's New and Table 15-2 of the user's manual.
A says "This is the address you can reach me on" (indeed you could mis-use the dds.transport.UDPv4.builtin.public_address as described by the NAT traversal property), but this assumes that the address it is giving is in fact .reachable. by the participants it is talking to. In the extreme case, you can misconfigure A to announce it's return address as "localhost", ie 127.0.0.1 -- when B tries to respond to that, its responses never leave the B machine -- and so A and B only talk if they are on the same machine (but that's what the shmem built-in is for).
So for B to not be able to reach A is a problem with either A's configuration, or the network infrastructure between B and A. I would restate your bullets as:
So again, it depends on what A is reporting as its return address. If it is reporting a multicast address, and it is not reachable via en1, then B's Discovery responses will be denied on en0 by the local configuartion, and on en1 by the external network infrastructure (switches, routers, etc).
My expectation is that:
Can you clarify what your expectations are wrt deny_multicast_interfaces_list, what you are seeing, and why you think this isn't working as expected?
r
Your explanation describes the situation very well and is more consistent than my expectation. My expectation was coming from the situation when 2 not connected networks are used simultaneously and was that discovery packets try to be responded on the same interface they are received, so that B receives discovery packet from A on interface en0, checks whether multicast is enabled for interface en0 and responses accordingly via multicast or unicast.
That B tries to reach A via every available interface and tries first multicast-enabled interface explains all the phenomena I was observing.
So summarizing how discovery works, the way to configure participant B from my example is to disable multicast in that participant completely and specify address of B in discovery_peers of another participants? In this case participant B will be discovered via unicast and response via unicast, right? Another solution is probably to check and adjust routing tables on participant B host to ensure that specific multicast addresses cannot be sent over en1.
Thank you for extensive description of how discovery works!
We don't have a setting for controlling outgoing choice of Interface -- the network stack will choose one or the other and this is outside the control of the middleware. (Though, you can use the allow/deny_interface_list settings as per this page: http://community.rti.com/kb/how-do-i-restrict-rti-connext-use-only-subset-interfaces), but that's still not fine grain control. You can prevent DDS from using eth0 or eth1 to get all data to go out on the other. But if both are enabled, then the OS/Network Stack will pick whichever one is better (as per some algorithm, which might just be "the first one that works").
I still think the problem is in how A is configured, not in anything in B (B should simply be reacting to what it sees from A). I'm going to get another engineer to look at it and correct/clarify my statements.
My assumption that it's related to the B configuration was based on two tests, performed under condition that A and B cannot reach each other over multicast:
In both cases A has the same configuration and sends discovery information via unicast to B. According to the description how discovery works I would assume that B responses via unicast (multicast is disabled) in 1st test and via multicast (multicast is enabled due to multicast in discovery peers) in 2nd test. I think that A sends a participant discovery paket with both
metatraffic_unicast_locators
andmetatraffic_multicast_locators
and B uses one of these fields.This is true, but (what I've since found out, and) what may be getting in the way is that B will use the multicast locators if they are offered, and in this case ignore the unicast ones.
So A does send both its multicast address (because it has one), and its unicast address, and since B can use multicast, it selects the multicast locator and sends its response (which is not routable, apparently).
Would that explain the behavior you are seeing?
Yes, exactly :)