Are there any known issues with running RTI DDS on Windows XP? We are running services that exchange messages on networks with both Windows 7 and Windows XP machines. The Windows 7 machines are working well whereas the Windows XP machines are seeing connection/discovery issues on a regular basis. Any special precautions/configurations needed when running on XP?
Thanks!
Hi,
I am not aware of any issues of that nature specific to Windows XP. Have you checked that your firewall is disabled on both computers? Are you using the defaul RTI_DISCOVERY_PEERS on both sides if not, are they set consistently in both machines?
Are the applications that you run on either computer the same. What about their QoS cofigurations?
Gerardo
By way I noticed an earlier post you had regarding issues using the create_participant_with_profile() function in Linux. I was actually trying to reproduce it but the posting has been deleted. Did you figure out the problem? If so it may be helpful if you posted a note on what the issue was and how you tracked it down/solved it so that others may benefit...
Hi Gerardo,
Sorry - I was hoping I pulled that other post down before anybody spent any time looking at it. The problem was caused by the guy typing this reply (i.e., stupid user error), so I didn't post to let everyone in on that. The function can't find a profile if it can't find the USER_QOS_PROFILES.xml file. :-/
As far as the Windows XP issue, the firewalls are off on all systems. The appplications are the same across the systems. Our USER_QOS_PROFILES.xml is attached.
Jim
Hi,
I do not see anything in your QoS profiles that could justify that kind of behavior. In fact as far as I can see your QoS profile is really not setting anything other than disabling the shared memory transport and setting the participant name/role.
One thing that seemed odd is that you are configuring all the profiles with the "is_default_qos" as true. Normally only one QoS profile should be tagged with "is_default_qos=true". The is_default_qos is intended to mark the QoS profile to use in case your application does not specify any name. So only one will apply. If you tag multiple then only the last one appearing in the file will be treated as the "default" one. This has no adverse consequences but it is a bit confusing I think.
Given that all your QoS are essentially the default ones I would still suspect some kind of network config/firewall issue. Can you try just with the standard rtiddsping / rtiddspy applications and see if you see the same problem? In fact what I would try is to run ping on one computer against the other and explicitly pass the -peer parameter to the command line in this manner
Computer1> rtiddsping -sub -transport 1 -peer 10.10.10.10
Note: I am intentionally trying to use an IP address that will not match the IP of the other computer to see the effect of the other side communicating.
Once the subscriber application is running and waiting then start the other side:
Computer2> rtiddsping -pub -transport 1 -peer 239.255.0.1
Note: The publisher is started with peers set to the multicast address that the subscriber application should be listening to (it is the default multicast all applications listen to, unless otherwise specified). You should immediately see data being received on the subscriber side.
Note: For this test it is important to start them in the order I specified. First Computer1 and then Computer2. That way the initial anouncemnets that Computer2 will send to the multicast address will be received by Computer1 which is already running.
If this works, it proves that multicast work well from Computer1 -> Computer2. Then try the reverse:
First on Computer2:
Computer2> rtiddsping -sub -transport 1 -peer 10.10.10.10
Computer1> rtiddsping -pub -transport 1 -peer 239.255.0.1
Gerardo,
That works just fine in each direction regardless of Windows 7 or Windows XP on either end.
More specifically what we are seeing...
We have 2 identical applications exchanging an application level heart beat - each app sends it out at 1 second intervals. The problem we're seeing is that the Windows XP machine is seeing the heartbeat come in about once every 75 seconds even though we know the sender (Windows 7) is transmitting at 1 second intervals. The other way seems okay. It just seems that once discovery is done the actual exchange of messages is very delayed.
Thanks,
Jim
Hi Jim,
This is very odd. I cannot think of anything that would cause that other than the sending application is not really sending the message every second... But I assume that you have already tried to put a printf() or something similar that verifies that you are actually calling the DataWriter.write() once per second...
Certainly I do not suspect discovery. Even if discovery was having issues you have discovered the application and get the first message it would not time out for several minutes so you would get your continuous stream of hearteats for a while and then nothing... So the behavior you describe cannot be explained by this.
What is the reading application doing? Did it install a listener to capture the heartbeat? Is is waiting on a WaitSet? If a WaitSet did you attach some condition that would make the DataReader wakeup when the heartbeat message arrives? I am shooting in the dark a bit here because this would make no difference running on Windows XP or Windows 7.
This is what I would do to trouble-shoot it:
(1) If you have not done so put a printf() after the DataWriter.write() operation and print the return value. This will ensure you are actually writing each second and that the return code is RETCODE_OK
(2) Try to reproduce it using rtiddsping. It takes a parameter that allows you to configure the send rate. I assume this will work fine...
(3) Run your application on Windows 7 as normal and "rtiddsspy -print" on the Windows XP. Does it get the sample you send each second or not?
(3.1) If rtiddspy on Windows XP is not getting the messages from the Windows 7, then run Wireshark. You can use the one that comes with the RTI connext installation or download one from http://www.wireshark.org/ Both come with a dissector for RTPS packets installed. Capture the packets and verify that your sending application is sending the message once per second to the receiver application... If you want you can save one of the captures (the PCAP file) and attach it to this thread and I can take a quick look at it.
(3.2) If rtiddspy is getting the messages once per second. Then I would suspect the reception logic. If you are not using a listener on the heartbeat DataReader I would start by tryng that just to make sure it works that way. And them transition to the logic you are using.
Gerardo
Hi Gerardo,
When we run rtiddsping -print, at first it shows all message types periodically then it starts getting slower and gets nothing after about 10 seconds. If I restart rtiddsspy, it does the same thing again.
I have attached a copy of our DDSListener we are using to receive these messages. So we are using a pretty standard listener. Further investigation amkes me believe, however, that something is wrong with our listener. We logged the reads vs writes across our network for the heart beats and it is hardly a 1-1 correlation. Sometimes, with one write our listener gets 8 reads. And there are times we will write many times without a single read. So, I'm wondering if something is causing the listener to stall and then attempt to catch up because eventually we do get a bunch of reads.
One other file I attached is what we see periodically from rtiddsspy.
Appreciate your assistance!!
Jim
Hi,
I took a look at your listener. Nothing really stands out there.
The dump from rtiddsspy you attached is not showing any data. All the traffic shown there is discovery traffic. The "data" traffic should have a 'd' or a 'D' in the Info column (see below), the 'W' and 'R' just illustrates discovery traffic which is not periodic.
If you run
rtiddsgen -hOutput
it will display help on what the letters and symbols in the info colunm mean. For example "W" means a DataWriter was discovered "N" means this DataWriter was not seen before by rtiddsspy. "d" means data was received (for an unkeyed topic, otherwise it would be "D". The first data has "+N" he remaining have "+M" indicating that the instance (in this case with no Key there is just one instance) was already seen and is now modified by the received sample.In your rtiddsspy log all I see is a "R ?M" and "W ?M". This all indicate a loss of discovery liveliness. In otehr words rtiddspy is nt getting the liveliness messages from the other application. I wonder if you are getting the initial discovery and nothing else afterwards... How many NICs (Network Interfaces) do your computers have? If you have more than 4 it could be a problem. RTI Connect DDS will only use the first 4, if you had say 8 and the frist 4 were "internal addresses" that are not reachable from the other computer then multicast messages would go through but not the unicast ones.
Let me re-cap to make sure I understand what you are saying. Let me know if any of these statemets are wrong:
(1) If you run (in isolation):
Windows7> rtiddsping -sendPeriod 1
WindowsXP> rtiddsspy -print
This works as expected, rtiddsspy keeps getting data every 1 second and continues to do so over time.
Can you verify this? The output you send fron rtiddspy does not show this.
(2) If you run (in isolation):
Windows7> yourPublisherApplication
WindowsXP> rtiddsspy -print
This receives data periodically at expected but only for the first 10 seconds or so. Then nothing afterwards.
If you leave the publisher application and re-start rtiddspy the same behavior is repeated. Gets data for 10 seconds or so, then no data.
(3) If you run (in isolation):
Windows7> yourPublisherApplication
WindowsXP> yourSubscribingApplication
Then you get strange behaviors. No 1-to-1 correlations between reads and writers. Sometimes there is a single write and it results on 8 reads. Sometimes many writes results on no reads.
I am still quite puzzled with all this. If (1) is really true then it cannot really be the listener code because rtiddsspy is exhibiting the same behavior. So this would point to the publisher; some sort of blocking behavior... But if you are logging the writes and they are suceeding then that cannot be explained either...
Are you able to do a Wireshark capture that exhibits scenario (1)?
Gerardo
Gerardo,
It turns out the issue was caused by an internal application message queue falling behind. Thus, it appeared the messages were not arriving but were instead getting stuck in the recv message queue. Each received message was being handled with a database access and Windows XP turned out to be EXTREMELY slow on those accesses causing the message queue to fill up. On Windows 7 the queue rarely had more than 1 message in the queue.
Thanks for your assistance - using the suggestions you provided was key to helping us resolve the issue. As always, your help was fantastic!
Jim