Unreliable connection over WiFi.

5 posts / 0 new
Last post
His Nerdship's picture
Offline
Last seen: 5 years 6 months ago
Joined: 05/08/2019
Posts: 18
Unreliable connection over WiFi.

I am writing a (modern C++) module to simulate a device that connects to a central registry to send and obtain information.  The registry and device each pass data both ways, so act as both subscribers and publishers.  The device sends the first request message, and the registry returns a suitable response (or should do).

When both apps are on the same machine, it works 100%.  Alas not a very good proof of concept.  I also have two PC's connected through a WiFi dongle, with one running the registry and the other the 'device'.  They ping nicely.  But when I run it now, it works sometimes but not always.  I am running DDS Spy on both machines and I see the initial message from the 'device' to the registry usually gets through (with 'D' entries in the Info column). However the return message to the device is often lost.  When this happens, I see no 'D' data messages in the Spy output on the device machine (but they do exist in the Spy output on the registry machine).  So the response doesn't even reach the 'device' machine, let alone the application.

So a few questions:

  1. Are there known issues using DDS over a WiFi connection?
  2. If it should be working correctly, where is the problem?  The fact that DDS Spy doesn't detect the response on the device machine suggests it is in the bowels of DDS, namely the code created by the Code Generator (you know, the code with the warnings not to mess with it).  So although this sounds like a cop-out, I don't see how my code can affect this.
  3. Am I wrong here?
  4. If I can change things at my end, would I do it via the QoS file or the C++ code?  If so, can someone suggest where?

This is my first DDS program so any help much appreciated.

Organization:
Offline
Last seen: 2 months 4 days ago
Joined: 10/22/2018
Posts: 91

His Nerdship,

1. Connext DDS should work over WiFi without issues

2. It's possible that the problem could be related to the network itself or your QoS configuration.

One of the most likely culprits for the issue you are describing is a firewall. If you have a firewall running on either of the PCs involved in the communication, could you try disabling it?

If that doesn't work I'll need some more information from you:

When running on the same PC, are they communicating via shared memory or UDP?
If you haven't explicitly configured the transport_builtin mask the two applications will attempt to communicate using every locator in the initial_peers list (which by default includes both UDP and SHMEM locators).

You said that you can ping between the two devices - was this using the rtiddsping tool shipped with Connext DDS? If not, please run a test with this and let me know the results.

Would you be able to obtain a wireshark capture for me (preferably one on each PC, started before the applications).

Sam

His Nerdship's picture
Offline
Last seen: 5 years 6 months ago
Joined: 05/08/2019
Posts: 18

Thanks for answering, Sam,

  • Re the firewall, I do indeed have Norton Security running, but it is possible to allow custom access to the Internet for named programs, which I have done.  Besides, surely a firewall would prevent all access, whereas I find it sometimes works (wouldn't be much of a firewall if it let some through!).  And the failures are overwhelmingly on the return journey from the registry app to the device app.
  • Shared memory or UDP - sorry I just did what the tutorials recommended and let the Code Generator create the basic app.  Apart from the publisher and subscriber code modules, they all have a very stark 'Do not modify' warning.  So I am not sure of the internal workings - I have enough on my plate mastering the outside, let alone the inside!
  • I pinged using the regular ping command from a DOS window.  I will try rtiddsping.
  • I will look into the transport_builtin mask.  Thanks for suggesting it.

I will let you know the result.  It is Friday night here in Oz so it will be tomorrow.

Offline
Last seen: 2 months 4 days ago
Joined: 10/22/2018
Posts: 91

Just to completely rule out the firewall could you try disabling it for one test?
Copying the relevant snippet below from the Knowledge Base article "Why are my reader and writer applications unable to communicate?"


Firewalls. 
If there is a firewall between the two machines, RTI Connext may not be able to communicate because the firewall does not allow packets on the required UDP ports through the firewall. Disabling the firewall or modifying the configuration to allow the required ports will enable communication. If multicast communication is required, such as in the default discovery peer configuration, you must ensure that the multicast groups being used are forwarded by the firewall as well.

Regards,
Sam
 
His Nerdship's picture
Offline
Last seen: 5 years 6 months ago
Joined: 05/08/2019
Posts: 18

Hi Sam,

I think I have found the problem.

When the registry sets up a DataWriter to send the topic back to the device, there is a delay between its creation and it being ready to write.  If you callDataWriter::write(const T& data) too soon it just returns without doing anything.  Because it is a voidfunction, and because it is non-blocking, there is no way to ensure it has done its stuff.

When sending between the two apps on the same machine, the DataWriter clearly readies itself very quickly, so it is ready to write.  However over a WiFi link, it obviously needs more time.  Much more.

I saw a forum post about this, where they suggest putting in a delay:
https://community.rti.com/forum-topic/dynamic-data-writing-seems-need-setup-time

I had originally put in a delay of 500ms, so it occasionally worked. When I increased this to 2 seconds, it worked every time (sometimes after 1 - 2 attempts).

However, this is a pretty crude workaround.  I will post another question re a more elegant approach.