Abnormal speed of sending data

5 posts / 0 new
Last post
Offline
Last seen: 8 years 7 months ago
Joined: 09/10/2015
Posts: 3
Abnormal speed of sending data

Hello, I develop a communication module base on the RTI Connext DDS, and try to apply it into our system.At first, there is no problem about publish speed, but recently, the publish speed about some datatypes becomes abnormal, and the other not.

For example, a software called S1, the other one called S2.And there is 3 computers named C1~C3.The softwares use the same communication module, the same version DDS. The computers have same hardware , same operator system(Win7) and same development platform(VS2008 SP1).

At First, I define a struct to describe the data, S1 sends it and S2 receives it.I place S1 in C1, S2 in C2, and find that the publish speed becomes very very slow, By debugging I find that the DDS cost much time(>= 3s)to send 1 sample, when I place S1 in C2 or C3, speed can be 15 sample/s(debug) and 60 sample/s(release)(I set speed to be 60 sample/s). I don't know what cause this problem, and I guess that  the datatype may be complex.

So I convert the data from struct to string(Length is 5KB), and think that it can solve the problem.  But I find that the problem still exists,and it cost more time(>=10s) to send one sample. (C1/S1)  to (C2/S2) is normal, (C1/S1) to (C3/S2)  is abnormal, (C2/S1) to (C3/S2) is normal, when S1 and S2 are placed in the same computers, the speed is normal.Furthermore, when (C1/S1)  to (C2/S2) is normal, I close S2 in C2 and execute S2 in C3, then the speed becomes abnormal, when (C1/S1) to (C3/S2) is abnormal, I close S2 in C3 and start S2 in C2, the speed still be abnormal.

This strange situation let me confuse,  Which cause the problem? How should I do to solve the problem?

Thanks,

gabrieldeng

PS: We are running RTI  Connext DDS v5.1.0. Some data's sending speed is normal, for example a datatype contains 3 variable of int and 2 variable of string.

Juanjo Martin's picture
Offline
Last seen: 1 year 7 months ago
Joined: 07/23/2012
Posts: 48

Hi gabrieldeng,

You mention that DDS spends too much time to send just one sample. How do you know this? Are you measuring just the time to do write()? 3 seconds is a lot. Are you using BEST_EFFORT or RELIABLE? In case you are using RELIABLE, can you try with BEST_EFFORT?

My theory is that at some point, the send window gets full (if you are using it) and it blocks internally until new space is available.

Of course, the ideal solution is not to switch to BEST_EFFORT but to find why even with RELIABLE it can't give good performance for a small publication rate (60 samples/s should be easily achievable).

I think that the best way to proceed here would be to provide me with the QoS file or even the whole application. If you don't want to attach it, please feel free to email it to me at juanjo@rti.com.

Thanks,

Juanjo Martin

Offline
Last seen: 8 years 7 months ago
Joined: 09/10/2015
Posts: 3

Hi Juanjo Martin, first thanks for your help.

Yes, I measure the time that write() cost. We are using RELIABLE, not try with BEST_EFFORT because it is not fit for our needs. Our softwares use the same communication module and Qos.

These days I try to get more information.

At first the software's performance provided by Windows, the result is that: Running time is 384s, and CPU time is 3s~4s, CPU Usage is 0%; No disk I/O; Network message shows that the transmission rate is 13,000B/s~16,000B/s, the normal rate can be 9,892,182B/s. The result just appears when the sending software and the receiving software are running on the two specific computers, and the speed problem is one way. For example, from Computer 1 to Computer 2 the speed is slow, but from Computer 2 to Computer 1 the speed is normal, and the problem just affect the sending software and don't affect the receiving rate(I run the sending software in Computer 3, then the receiving rate increaes on the Computer 2).

Second, I think that what will occur if I shorten the Length of the string, so I try to use different Lengths, and I find the threshole exists. When the length is 930B, the speed between the two specific computers is normal, when it is 931B or 932B, it will wait 1s~2s when it use 8s to send data, when it is above 932B, the speed problem appeare again.

At last, I think that the communication module using by softwares is the same, it use the same Qos, the copy of RTI Connext DDS is the same, so if the problem is not caused by DDS but Windows? So I check the computers that the problem appears, and I find that these computers use different copy of Windows(Though they are all Win7). There is no problem when the computers using the same copy of Windows,  so I realize that the problem may be caused by different configuration in Windows. I reinstall Windows, make sure that the computers use the same copy of Windows. After it I test the softwares again, at this time the problem disappears. So I am sure that some Windows's comfiguration affect the DDS and it causes the problem.

So, Which configuration in Windows will affect the DDS to cause the speed problem? How can I query and change it to solve the problem?

Thanks for much!

gabrieldeng

PS: Qos is attached.

Juanjo Martin's picture
Offline
Last seen: 1 year 7 months ago
Joined: 07/23/2012
Posts: 48

Hi gabrieldeng,

It seems that you have a problem in the link between the apps running in computers 1 and 2: network stack and NIC in computer 1 -> wire -> NIC and network stack in computer 2. The fact that you see the problem with sizes greater than a threshold makes me think about IP Fragmentation and losses at that level. Unfortunately, all the issues I have seen in the past are related to the subscribing machine, not to the publishing one. So I don't currently know what is going on in your system.

However, I would like to know what is going on in your system and help you debug it. Can you tell me which version of Windows is giving problems and which one is not? Maybe there is something about it already posted in any Microsoft webpage. I assume that the version giving problems is older... right?

Another great way to get more information is to use Wireshark. We have our own customized version of Wireshark that can be found in this community portal:

https://community.rti.com/downloads/rti-wireshark

If you need help analyzing the traffic capture, please provide me with it and I will do it. Having two captures of the issue happening and not happening would be great (size greater and smaller than the threshold).

Note that at this point, DDS is out of the picture. The problem is related to the transports you are using (I suspect UDP) and its lower layers.

Thanks,

Juanjo Martin

 

 

Offline
Last seen: 8 years 7 months ago
Joined: 09/10/2015
Posts: 3

Hi Juanjo Martin.

Now our softwares can run correctly, and that problem don't occur after I reinstall Windows, my copy of Windows is Professional, that caused the problem is Ultimate and a former colleague install it, maybe he use a software to change the system.

I will install a new computer with Ultimate version, and test it again to make sure if the system or the change caused the problem.

Thanks for your help.

gabrieldeng