RTI DDS on Windows XP

Mon, 02/25/2013 - 14:58

#2

Gerardo Pardo

Offline

Last seen: 10 months 1 week ago

Joined: 06/02/2010

Posts: 602

Hi,

I am not aware of any issues of that nature specific to Windows XP. Have you checked that your firewall is disabled on both computers? Are you using the defaul RTI_DISCOVERY_PEERS on both sides if not, are they set consistently in both machines?

Are the applications that you run on either computer the same. What about their QoS cofigurations?

Gerardo

By way I noticed an earlier post you had regarding issues using the create_participant_with_profile() function in Linux. I was actually trying to reproduce it but the posting has been deleted. Did you figure out the problem? If so it may be helpful if you posted a note on what the issue was and how you tracked it down/solved it so that others may benefit...

Mon, 02/25/2013 - 15:20

#3

jhewell

Offline

Last seen: 5 years 10 months ago

Joined: 11/30/2012

Posts: 18

Hi Gerardo,

Sorry - I was hoping I pulled that other post down before anybody spent any time looking at it. The problem was caused by the guy typing this reply (i.e., stupid user error), so I didn't post to let everyone in on that. The function can't find a profile if it can't find the USER_QOS_PROFILES.xml file. :-/

As far as the Windows XP issue, the firewalls are off on all systems. The appplications are the same across the systems. Our USER_QOS_PROFILES.xml is attached.

Jim

File Attachments:

USER_QOS_PROFILES.xml

Mon, 02/25/2013 - 16:18

#4

Gerardo Pardo

Offline

Last seen: 10 months 1 week ago

Joined: 06/02/2010

Posts: 602

Hi,

I do not see anything in your QoS profiles that could justify that kind of behavior. In fact as far as I can see your QoS profile is really not setting anything other than disabling the shared memory transport and setting the participant name/role.

One thing that seemed odd is that you are configuring all the profiles with the "is_default_qos" as true. Normally only one QoS profile should be tagged with "is_default_qos=true". The is_default_qos is intended to mark the QoS profile to use in case your application does not specify any name. So only one will apply. If you tag multiple then only the last one appearing in the file will be treated as the "default" one. This has no adverse consequences but it is a bit confusing I think.

Given that all your QoS are essentially the default ones I would still suspect some kind of network config/firewall issue. Can you try just with the standard rtiddsping / rtiddspy applications and see if you see the same problem? In fact what I would try is to run ping on one computer against the other and explicitly pass the -peer parameter to the command line in this manner

Computer1> rtiddsping -sub -transport 1 -peer 10.10.10.10

Note: I am intentionally trying to use an IP address that will not match the IP of the other computer to see the effect of the other side communicating.

Once the subscriber application is running and waiting then start the other side:

Computer2> rtiddsping -pub -transport 1 -peer 239.255.0.1

Note: The publisher is started with peers set to the multicast address that the subscriber application should be listening to (it is the default multicast all applications listen to, unless otherwise specified). You should immediately see data being received on the subscriber side.

Note: For this test it is important to start them in the order I specified. First Computer1 and then Computer2. That way the initial anouncemnets that Computer2 will send to the multicast address will be received by Computer1 which is already running.

RTI Data Distribution Service Ping built with NDDS version 1.6a.00--C1.6a.00--C++1.6a.00
Copyright 2012 Real-Time Innovations, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NddsPing is listening for data, press CTRL+C to stop it.
Found 1 additional ping publishers(s).
Current publisher tally is: 1
Found 1 additional alive ping publishers(s).
Current alive publisher tally is: 1
NddsPing, issue received: 0000000
NddsPing, issue received: 0000001
NddsPing, issue received: 0000002

If this works, it proves that multicast work well from Computer1 -> Computer2. Then try the reverse:

First on Computer2:

Computer2> rtiddsping -sub -transport 1 -peer 10.10.10.10

Then on Computer 1.

Computer1> rtiddsping -pub -transport 1 -peer 239.255.0.1

See if that also works.

Gerardo

Mon, 02/25/2013 - 16:55

#5

jhewell

Offline

Last seen: 5 years 10 months ago

Joined: 11/30/2012

Posts: 18

Gerardo,

That works just fine in each direction regardless of Windows 7 or Windows XP on either end.

More specifically what we are seeing...

We have 2 identical applications exchanging an application level heart beat - each app sends it out at 1 second intervals. The problem we're seeing is that the Windows XP machine is seeing the heartbeat come in about once every 75 seconds even though we know the sender (Windows 7) is transmitting at 1 second intervals. The other way seems okay. It just seems that once discovery is done the actual exchange of messages is very delayed.

Thanks,
Jim

Mon, 02/25/2013 - 21:20

#6

Gerardo Pardo

Offline

Last seen: 10 months 1 week ago

Joined: 06/02/2010

Posts: 602

Hi Jim,

This is very odd. I cannot think of anything that would cause that other than the sending application is not really sending the message every second... But I assume that you have already tried to put a printf() or something similar that verifies that you are actually calling the DataWriter.write() once per second...

Certainly I do not suspect discovery. Even if discovery was having issues you have discovered the application and get the first message it would not time out for several minutes so you would get your continuous stream of hearteats for a while and then nothing... So the behavior you describe cannot be explained by this.

What is the reading application doing? Did it install a listener to capture the heartbeat? Is is waiting on a WaitSet? If a WaitSet did you attach some condition that would make the DataReader wakeup when the heartbeat message arrives? I am shooting in the dark a bit here because this would make no difference running on Windows XP or Windows 7.

This is what I would do to trouble-shoot it:

(1) If you have not done so put a printf() after the DataWriter.write() operation and print the return value. This will ensure you are actually writing each second and that the return code is RETCODE_OK

(2) Try to reproduce it using rtiddsping. It takes a parameter that allows you to configure the send rate. I assume this will work fine...

(3) Run your application on Windows 7 as normal and "rtiddsspy -print" on the Windows XP. Does it get the sample you send each second or not?

(3.1) If rtiddspy on Windows XP is not getting the messages from the Windows 7, then run Wireshark. You can use the one that comes with the RTI connext installation or download one from http://www.wireshark.org/ Both come with a dissector for RTPS packets installed. Capture the packets and verify that your sending application is sending the message once per second to the receiver application... If you want you can save one of the captures (the PCAP file) and attach it to this thread and I can take a quick look at it.

(3.2) If rtiddspy is getting the messages once per second. Then I would suspect the reception logic. If you are not using a listener on the heartbeat DataReader I would start by tryng that just to make sure it works that way. And them transition to the logic you are using.

Gerardo

Tue, 02/26/2013 - 09:15

#7

jhewell

Offline

Last seen: 5 years 10 months ago

Joined: 11/30/2012

Posts: 18

Hi Gerardo,

When we run rtiddsping -print, at first it shows all message types periodically then it starts getting slower and gets nothing after about 10 seconds. If I restart rtiddsspy, it does the same thing again.

I have attached a copy of our DDSListener we are using to receive these messages. So we are using a pretty standard listener. Further investigation amkes me believe, however, that something is wrong with our listener. We logged the reads vs writes across our network for the heart beats and it is hardly a 1-1 correlation. Sometimes, with one write our listener gets 8 reads. And there are times we will write many times without a single read. So, I'm wondering if something is causing the listener to stall and then attempt to catch up because eventually we do get a bunch of reads.

One other file I attached is what we see periodically from rtiddsspy.

Appreciate your assistance!!

Jim

File Attachments:

DPS_DDSListener.h

rtiddsspy.txt

Tue, 02/26/2013 - 11:29

#8

Gerardo Pardo

Offline

Last seen: 10 months 1 week ago

Joined: 06/02/2010

Posts: 602

Hi,

I took a look at your listener. Nothing really stands out there.

The dump from rtiddsspy you attached is not showing any data. All the traffic shown there is discovery traffic. The "data" traffic should have a 'd' or a 'D' in the Info column (see below), the 'W' and 'R' just illustrates discovery traffic which is not periodic.

 annapurna:apps_rti gerardo$ RTI500/ndds.5.0.0/scripts/rtiddsspy 

RTI Data Distribution Service Spy built with NDDS version 1.6a.00--C1.6a.00--C++1.6a.00
Copyright 2012 Real-Time Innovations, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NddsSpy is listening for data, press CTRL+C to stop it.

source_timestamp   Info  Src HostId  topic               type              
-----------------  ----  ----------  ------------------  ------------------  
1361905286.877231  W +N  0A1E017C    PingTopic           PingType          
1361905418.907823  d +N  0A1E017C    PingTopic           PingType          
1361905419.908164  d +M  0A1E017C    PingTopic           PingType          
1361905420.908355  d +M  0A1E017C    PingTopic           PingType

If you run rtiddsgen -hOutput it will display help on what the letters and symbols in the info colunm mean. For example "W" means a DataWriter was discovered "N" means this DataWriter was not seen before by rtiddsspy. "d" means data was received (for an unkeyed topic, otherwise it would be "D". The first data has "+N" he remaining have "+M" indicating that the instance (in this case with no Key there is just one instance) was already seen and is now modified by the received sample.

In your rtiddsspy log all I see is a "R ?M" and "W ?M". This all indicate a loss of discovery liveliness. In otehr words rtiddspy is nt getting the liveliness messages from the other application. I wonder if you are getting the initial discovery and nothing else afterwards... How many NICs (Network Interfaces) do your computers have? If you have more than 4 it could be a problem. RTI Connect DDS will only use the first 4, if you had say 8 and the frist 4 were "internal addresses" that are not reachable from the other computer then multicast messages would go through but not the unicast ones.

Let me re-cap to make sure I understand what you are saying. Let me know if any of these statemets are wrong:

(1) If you run (in isolation):

Windows7> rtiddsping -sendPeriod 1

WindowsXP> rtiddsspy -print

This works as expected, rtiddsspy keeps getting data every 1 second and continues to do so over time.

Can you verify this? The output you send fron rtiddspy does not show this.

(2) If you run (in isolation):

Windows7> yourPublisherApplication

WindowsXP> rtiddsspy -print

This receives data periodically at expected but only for the first 10 seconds or so. Then nothing afterwards.

If you leave the publisher application and re-start rtiddspy the same behavior is repeated. Gets data for 10 seconds or so, then no data.

(3) If you run (in isolation):

Windows7> yourPublisherApplication

WindowsXP> yourSubscribingApplication

Then you get strange behaviors. No 1-to-1 correlations between reads and writers. Sometimes there is a single write and it results on 8 reads. Sometimes many writes results on no reads.

I am still quite puzzled with all this. If (1) is really true then it cannot really be the listener code because rtiddsspy is exhibiting the same behavior. So this would point to the publisher; some sort of blocking behavior... But if you are logging the writes and they are suceeding then that cannot be explained either...

Are you able to do a Wireshark capture that exhibits scenario (1)?

Gerardo

Wed, 02/27/2013 - 10:29

#9

jhewell

Offline

Last seen: 5 years 10 months ago

Joined: 11/30/2012

Posts: 18

Gerardo,

It turns out the issue was caused by an internal application message queue falling behind. Thus, it appeared the messages were not arriving but were instead getting stuck in the recv message queue. Each received message was being handled with a database access and Windows XP turned out to be EXTREMELY slow on those accesses causing the message queue to fill up. On Windows 7 the queue rarely had more than 1 message in the queue.

Thanks for your assistance - using the suggestions you provided was key to helping us resolve the issue. As always, your help was fantastic!

Jim

Secondary menu

Navigation

RTI Community Portal Terms of Use

Search

Secondary menu

You are here

Navigation

User login

RTI DDS on Windows XP

RTI Community Portal Terms of Use