I am writing a (modern C++) module to simulate a device that connects to a central registry to send and obtain information. The registry and device each pass data both ways, so act as both subscribers and publishers. The device sends the first request message, and the registry returns a suitable response (or should do).
When both apps are on the same machine, it works 100%. Alas not a very good proof of concept. I also have two PC's connected through a WiFi dongle, with one running the registry and the other the 'device'. They ping nicely. But when I run it now, it works sometimes but not always. I am running DDS Spy on both machines and I see the initial message from the 'device' to the registry usually gets through (with 'D' entries in the Info column). However the return message to the device is often lost. When this happens, I see no 'D' data messages in the Spy output on the device machine (but they do exist in the Spy output on the registry machine). So the response doesn't even reach the 'device' machine, let alone the application.
So a few questions:
- Are there known issues using DDS over a WiFi connection?
- If it should be working correctly, where is the problem? The fact that DDS Spy doesn't detect the response on the device machine suggests it is in the bowels of DDS, namely the code created by the Code Generator (you know, the code with the warnings not to mess with it). So although this sounds like a cop-out, I don't see how my code can affect this.
- Am I wrong here?
- If I can change things at my end, would I do it via the QoS file or the C++ code? If so, can someone suggest where?
This is my first DDS program so any help much appreciated.
His Nerdship,
1. Connext DDS should work over WiFi without issues
2. It's possible that the problem could be related to the network itself or your QoS configuration.
One of the most likely culprits for the issue you are describing is a firewall. If you have a firewall running on either of the PCs involved in the communication, could you try disabling it?
If that doesn't work I'll need some more information from you:
When running on the same PC, are they communicating via shared memory or UDP?
If you haven't explicitly configured the transport_builtin mask the two applications will attempt to communicate using every locator in the initial_peers list (which by default includes both UDP and SHMEM locators).
You said that you can ping between the two devices - was this using the rtiddsping tool shipped with Connext DDS? If not, please run a test with this and let me know the results.
Would you be able to obtain a wireshark capture for me (preferably one on each PC, started before the applications).
Sam
Thanks for answering, Sam,
I will let you know the result. It is Friday night here in Oz so it will be tomorrow.
Just to completely rule out the firewall could you try disabling it for one test?
Copying the relevant snippet below from the Knowledge Base article "Why are my reader and writer applications unable to communicate?"
Sam
Hi Sam,
I think I have found the problem.
When the registry sets up a DataWriter to send the topic back to the device, there is a delay between its creation and it being ready to write. If you call
DataWriter::write(const T& data)
too soon it just returns without doing anything. Because it is avoid
function, and because it is non-blocking, there is no way to ensure it has done its stuff.When sending between the two apps on the same machine, the DataWriter clearly readies itself very quickly, so it is ready to write. However over a WiFi link, it obviously needs more time. Much more.
I saw a forum post about this, where they suggest putting in a delay:
https://community.rti.com/forum-topic/dynamic-data-writing-seems-need-setup-time
I had originally put in a delay of 500ms, so it occasionally worked. When I increased this to 2 seconds, it worked every time (sometimes after 1 - 2 attempts).
However, this is a pretty crude workaround. I will post another question re a more elegant approach.