Reliable Vs BestEffort latency/throughput results in Perftest

6 posts / 0 new
Last post
Offline
Last seen: 6 years 2 months ago
Joined: 10/24/2013
Posts: 15
Reliable Vs BestEffort latency/throughput results in Perftest

Hi,

 

 

The following are the results that I get for reliable communication:

Publisher:
Length:   100  Latency: Ave    186 us  Std   33.0 us  Min    136 us  Max    270 us 50%    182 us 90%    232 us  99%    270 us 99.99%    270 us

Subscriber:
Length:   100  Packets:   500000  Packets/s(ave):   32208  Mbps(ave):    25.8  Lost: 0

and the following for BEST_EFFORT(-bestEffort cmd parameter) communication:

Publisher:
Length:   100  Latency: Ave  44587 us  Std 10993.3 us  Min    196 us  Max  49477 us 50%  48210 us 90%  49223 us  99%  49477 us 99.99%  49477 us

Subscriber:
Length:   100  Packets:   375489  Packets/s(ave):   38741  Mbps(ave):    31.0  Lost: 124511

For best effort I would expect higher throughput which I observe (31mbps Vs 25.8 Mbps) but also a decreased latency, which is certainly not the case. (44587us Vs 186us)

Can you please explain why do I observe a higher latency value for best_effort in comparison to reliable communication? I would expect the other way round since in reliable communication, the publisher/subscriber will resend samples that can account for increased latency values.

From the code I infer that latency is calculated simply be substracting the subscriber's timestamp from the current time.

In the function:  processMessage(Message& message)

    sec = message.timestamp_sec;
    usec = message.timestamp_usec;
    sentTime = ((unsigned long long)sec << 32) | (unsigned long long)usec;
    if (now >= sentTime)  {

        latency = (unsigned long)(now - sentTime);
        // keep track of one-way latency;
        latency /= 2;
    }

Another issue that I have for reliable Vs Best_Effort communication is that in the case of best_effort setting there are times when all the latency echos from the subscriber are lost and I don't get back any latency results on the publisher side. No matter how high I set numIter param to be- 100000,500000 etc there are times when all the latency echos are lost.

Can I change the QoS setting for the latency pings exclusively to be reliable so that I get results on the publisher side or does this reasoning make no sense at all and will give throughput/latency results that have no meaning since we are using best_effort for normal samples and reliable for latency pings!?

thank you!

Organization:
Offline
Last seen: 4 years 6 months ago
Joined: 09/10/2010
Posts: 32

Hello,

     The way that latency is measured within perftest is to have the publisher send out a latency ping message and then the subscriber sends back a latency response message.  The latency is calculated then by taking a timestamp on the receiving the latency response message and subtracting out the timestamp of the original send of the latency ping message.  This is all done within the publishing part of perftest and therefore the timestamps calculation is all done on the same node using the same clock.  What could be happening here is whether the subscribing node is CPU limited.  If the CPU of the subsribing application is peaked out at 100%, then little to no time is being allocated to the processing of the latency ping message from the publisher.  Can you check to see if the CPU of the subscribing node is very high?  If your subscribing side cpu usage (or publishing side cpu usage for that matter)  is very high, then you may want to throttle back the throughput some to allow the CPU's to catch up.  You can do this by using a SpinRate value within the setup of the perftest.

To affect the number of latency pings, you should update the -latencyCount parameter.  This parameter will allow you to specify the number of data publications that will be sent on the throughput topic between latency pings.  The smaller this number is, the more latency pings will be sent.

Juanjo Martin's picture
Offline
Last seen: 2 years 4 months ago
Joined: 07/23/2012
Posts: 48

Hi,

In addition to what Bert said, I would like to talk about how is measured the latency in perftest. 

The latency is measured in a two steps approach in order to avoid clock synchronization issues. Every -latencyCount samples, a sample is modified to be a latency ping. This means just modifying an ID defined within the message. When the Throughput DataReader receives that "special" message (in the Subscriber application), it checks if it is a latency ping, and if yes, it sends back THE SAME MESSAGE (what means, same source timestamp). As DataReaders cannot send samples, this message is sent back using what is called the Latency DataWriter. 

In the Publisher application, there is a Latency DataReader listening to the Latency DataWriter messages. When it receives the message back, it substracts the source timestamp to the current time of the message (round trip time) and divides it by two to obtain the one way latency. 

Let's clarify some points:

- The latency messages that go from the Publisher side to the Subscriber side are THE SAME TOPIC as the throughput messages, and so, they have the same QoS settings. From this statement, we can deduce that if you set the throughput messages to be BEST EFFORT, the latency pings will be BEST EFFORT too in the Publisher - Subscriber way.

- The latency messages that go back from the Subscriber side to the Publisher side are a DIFFERENT TOPIC than the throughput messages, and so, they do have different QoS settings (LatencyQos). The fact is that the reliability configuration is set the same to this LatencyQos and to ThroughputQos in the following lines (RTIDDSImpl.cxx):

if (strcmp(topic_name, perftest_cpp::_AnnouncementTopicName) != 0)
{
    if (_IsReliable)
    {
        dw_qos.reliability.kind = DDS_RELIABLE_RELIABILITY_QOS;
        dw_qos.history.kind = DDS_KEEP_ALL_HISTORY_QOS;
        dw_qos.reliability.max_blocking_time = DDS_DURATION_INFINITE;
    }
    else
    {
        dw_qos.reliability.kind = DDS_BEST_EFFORT_RELIABILITY_QOS;
        dw_qos.history.kind = DDS_KEEP_ALL_HISTORY_QOS;
    }
}

But now that you know this, you can give a try to modify this code snippet to set to the LatencyQos DataWriter (and so, modify the DataReader side too) so it just modifies the ThroughputQos settings for the DataWriter and DataReader. This can be easily done using the following comparison in the if:

if (strcmp(topic_name, perftest_cpp::_ThroughputTopicName) == 0) 

And adding the reliability QoS settings that you want to the XML profile (LatencyQos).

The key thing here, is that even setting the way-back communication RELIABLE, I think you will still get big latencies. I think that the reason why you are getting big latencies is because of the waiting time in the DataReader queue that every sample has to spend when arriving to the Subscribing application. If you set Reliable communications (Throughput), the Writer queue is limited to 50 samples by default and it is blocking when the queue is full (resulting in normal latency). But when you use Best Effort, the writer is writing all the time without worrying about the DataReader. And in Perftest, the DataReader has by default a max samples configuration of 10000... When that queue is full, the latency pings have to wait a lot of time in that queue and so, you see a big latency. 

If I wanted to see a better latency with a Best Effort communication, I would give a try to reduce the DataReader's max samples. You can do that in the XML file (Throughput profile).

I hope all this information helps you.

Juanjo Martin

Offline
Last seen: 6 years 2 months ago
Joined: 10/24/2013
Posts: 15

hi,

thank you for your response. It makes sense that in the best_effort case, the data writer is writing at its max speed and the data reader might not get the chance to acquire cpu to do its latency processing. I will try -spin or sleep between writes

Offline
Last seen: 6 years 2 months ago
Joined: 10/24/2013
Posts: 15

Hi Juanjo,

Thank you for your response. I do not understand the following:

"But when you use Best Effort, the writer is writing all the time without worrying about the DataReader. And in Perftest, the DataReader has by default a max samples configuration of 10000... When that queue is full, the latency pings have to wait a lot of time in that queue and so, you see a big latency"

I thought that 10000 was the default number of samples after which a latency ping is sent. I didn't realize that it is the size of a queue at the datareader side? I don't quite follow how sending the latency pings more often will help reduce the observered latency values:

"If I wanted to see a better latency with a Best Effort communication, I would give a try to reduce the DataReader's max samples. You can do that in the XML file (Throughput profile)."


Can you please explain this in more detail.

thanks!

Juanjo Martin's picture
Offline
Last seen: 2 years 4 months ago
Joined: 07/23/2012
Posts: 48

Hi Shweta,

We are talking about two different settings. That 10000 I am talking about is the max samples of the DataReaderQos.resource_limits (look for <max_samples>10000</max_samples> in the DataReaderQos defined in the ThroughputQos profile).

Take a look at the XML file and let me know if you find it or not.

Regards,

Juanjo Martin